Readability

With Snakemake, data analysis workflows are defined via an easy to read, adaptable, yet powerful specification language on top of Python. Each rule describes a step in an analysis defining how to obtain output files from input files. Dependencies between rules are determined automatically.

Portability

By integration with the Conda package manager and container virtualization , all software dependencies of each workflow step are automatically deployed upon execution.

Modularization

Rapidly implement analysis steps via direct script and jupyter notebook integration. Easily create and employ re-usable tool wrappers and split your data analysis into well-separated modules.

configfile: "config.yaml"
rule all:
input:
expand( "plots/{country}.hist.pdf",
country=config["countries"]
)
rule select_by_country:
input:
"data/worldcitiespop.csv"
output:
"by-country/{country}.csv"
conda:
"envs/xsv.yaml"
shell:
"xsv search -s Country '{wildcards.country}' "
"{input} > {output}"
rule plot_histogram: input: "by-country/{country}.csv" output: "plots/{country}.hist.svg" container: "docker://faizanbashir/python-datascience:3.6" script: "scripts/plot-hist.py"
rule convert_to_pdf: input: "{prefix}.svg" output: "{prefix}.pdf" wrapper: "0.47.0/utils/cairosvg"

Transparency

Automatic, interactive, self-contained reports ensure full transparency from results down to used steps, parameters, code, and software.

Slide Background

Scalability

Workflows scale seamlessly from single to multicore, clusters or the cloud, without modification of the workflow definition and automatic avoidance of redundant computations.

workstation

Slide Background

compute server

Slide Background

cluster

Slide Background

grid computing

Slide Background

cloud computing

Slide Background