The catalog
Here you can find the most important information about the Snakemake workflow catalog.
Purpose
This repository serves as a centralized collection of workflows designed to facilitate reproducible and scalable data analyses using the Snakemake workflow management system.
The Snakemake Workflow Catalog aims to provide a regularly updated list of high-quality workflows that can be easily reused and adapted for various data analysis tasks. By leveraging the power of Snakemake, these workflows promote:
Reproducibility: Snakemake workflows produce consistent results, making it easier to share and validate scientific findings.
Scalability: Snakemake workflows can be executed on various computing environments, from local machines to high-performance computing clusters and cloud services.
Modularity: Snakemake workflows are structured to allow easy customization and extension, enabling users to adapt them to their specific needs.
Using workflows
Basic usage
To get started with a workflow from the catalog:
Clone the repository or download the specific workflow directory.
git clone https://github.com/<user>/<workflow>
Review the documentation provided with the workflow to understand its requirements and usage.
Configure the workflow by editing the
config.yml
files as needed.Create an environment with access to Snakemake. It is recommended to use
mamba
.
mamba create -n <env-name> -c <channels> snakemake
mamba activate <env-name>
Execute the workflow using Snakemake.
cd <workflow-dir>
snakemake --cores 2
Tip
Use the --dry-run
option first to check if all inputs are found.
For more detailed instructions, please refer to the individual documentation for each workflow.
Deployment options
The deployment method is controlled using the --software-deployment-method
(short --sdm
) argument.
To run the workflow with automatic deployment of all required software via conda
/mamba
, use
snakemake --cores all --sdm conda
To run the workflow using apptainer
/singularity
, use
snakemake --cores all --sdm apptainer
To run the workflow using a combination of conda
and apptainer
/singularity
for software deployment, use
snakemake --cores all --sdm conda apptainer
Snakemake will automatically detect the main Snakefile
in the workflow
subfolder and execute the workflow.
For further options such as cluster and cloud execution, see the docs.
Adding workflows
Workflows are automatically added to the Workflow Catalog. This is done by regularly searching Github repositories for matching workflow structures. The catalog includes workflows based on the following criteria.
Generic workflows
The workflow is contained in a public Github repository.
The repository has a
README.md
file, containing the words “snakemake” and “workflow” (case insensitive).The repository contains a workflow definition named either
Snakefile
orworkflow/Snakefile
.If the repository contains a folder
rules
orworkflow/rules
, that folder must at least contain one file ending on.smk
.The repository is small enough to be cloned into a Github Actions job (very large files should be handled via Git LFS, so that they can be stripped out during cloning).
The repository is not blacklisted here.
Standardized Usage workflows
In order to additionally appear in the “standardized usage” area, repositories additionally have to:
have their main workflow definition named
workflow/Snakefile
(unlike for plain inclusion, which also allows justSnakefile
in the root of the repository),provide configuration instructions under
config/README.md
contain a
YAML
file.snakemake-workflow-catalog.yml
in their root directory, which configures the usage instructions displayed by this workflow catalog.
Typical content of the .snakemake-workflow-catalog.yml
file:
usage:
mandatory-flags:
desc: # describe your flags here in a few sentences
flags: # put your flags here
software-stack-deployment:
conda: true # whether pipeline works with '--sdm conda'
apptainer: true # whether pipeline works with '--sdm apptainer/singularity'
apptainer+conda: true # whether pipeline works with '--sdm conda apptainer/singularity'
report: true # whether creation of reports using 'snakemake --report report.zip' is supported
Note
Definition of mandatory flags can happen through a list of strings (['--a', '--b']
), or a single string ('--a --b'
).
Note
The content of the .snakemake-workflow-catalog.yml
file is subject to change. Flags might change in the near future, but current versions will always stay compatible with the catalog.
Once included in the standardized usage area you can link directly to the workflow page using the URL https://snakemake.github.io/snakemake-workflow-catalog/docs/workflows/<owner>/<repo>
. Do not forget to replace the <owner>
and <repo>
tags at the end of the URL.
Release handling
If your workflow provides Github releases, the catalog will always just scrape the latest non-preview release. Hence, in order to update your workflow’s records here, you need to release a new version on Github.
Contributions
Contributions to the Snakemake Workflow Catalog are welcome! Ideas can be discussed on the catalog’s Issues page first, and contributions made through Github Pull Requests.
License
The Snakemake Workflow Catalog is open-source and available under the MIT License. For more information about the individual workflows, browse the list of standardized usage workflows.
Note
All workflows collected and presented on the Catalog are licensed under their own terms!