SchlossLab/mothur-snakemake-workflow

Snakemake template for microbial amplicon sequence analysis with mothur.

Overview

Topics: 16s-rrna mothur snakemake

Latest release: None, Last update: 2023-02-09

Linting: linting: failed, Formatting: formatting: failed

Deployment

Step 1: Install Snakemake and Snakedeploy

Snakemake and Snakedeploy are best installed via the Mamba package manager (a drop-in replacement for conda). If you have neither Conda nor Mamba, it is recommended to install Miniforge. More details regarding Mamba can be found here.

When using Mamba, run

mamba create -c conda-forge -c bioconda --name snakemake snakemake snakedeploy

to install both Snakemake and Snakedeploy in an isolated environment. For all following commands ensure that this environment is activated via

conda activate snakemake

Step 2: Deploy workflow

With Snakemake and Snakedeploy installed, the workflow can be deployed as follows. First, create an appropriate project working directory on your system and enter it:

mkdir -p path/to/project-workdir
cd path/to/project-workdir

In all following steps, we will assume that you are inside of that directory. Then run

snakedeploy deploy-workflow https://github.com/SchlossLab/mothur-snakemake-workflow . --tag None

Snakedeploy will create two folders, workflow and config. The former contains the deployment of the chosen workflow as a Snakemake module, the latter contains configuration files which will be modified in the next step in order to configure the workflow to your needs.

Step 3: Configure workflow

To configure the workflow, adapt config/config.yml to your needs following the instructions below.

Step 4: Run workflow

The deployment method is controlled using the --software-deployment-method (short --sdm) argument.

To run the workflow with automatic deployment of all required software via conda/mamba, use

snakemake --cores all --sdm conda

Snakemake will automatically detect the main Snakefile in the workflow subfolder and execute the workflow module that has been defined by the deployment in step 2.

For further options such as cluster and cloud execution, see the docs.

Step 5: Generate report

After finalizing your data analysis, you can automatically generate an interactive visual HTML report for inspection of results together with parameters and code inside of the browser using

snakemake --report report.zip

Configuration

The following section is imported from the workflow’s config/README.md.

TODO: configuration instructions

  • how to run the demo (config/demo.yaml)
  • how to use your own dataset (config/crc/crc.yaml)

Linting and formatting

Linting results

  1Lints for snakefile /tmp/tmp2l12itn8/workflow/Snakefile:
  2    * Mixed rules and functions in same snakefile.:
  3      Small one-liner functions used only once should be defined as lambda
  4      expressions. Other functions should be collected in a common module, e.g.
  5      'rules/common.smk'. This makes the workflow steps more readable.
  6      Also see:
  7      https://snakemake.readthedocs.io/en/latest/snakefiles/modularization.html#includes
  8
  9Lints for rule download_silva (line 35, /tmp/tmp2l12itn8/workflow/Snakefile):
 10    * No log directive defined:
 11      Without a log directive, all output will be printed to the terminal. In
 12      distributed environments, this means that errors are harder to discover.
 13      In local environments, output of concurrent jobs will be mixed and become
 14      unreadable.
 15      Also see:
 16      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
 17    * Specify a conda environment or container for each rule.:
 18      This way, the used software for each specific step is documented, and the
 19      workflow can be executed on any machine without prerequisites.
 20      Also see:
 21      https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#integrated-package-management
 22      https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#running-jobs-in-containers
 23    * Param outdir is a prefix of input or output file but hardcoded:
 24      If this is meant to represent a file path prefix, it will fail when
 25      running workflow in environments without a shared filesystem. Instead,
 26      provide a function that infers the appropriate prefix from the input or
 27      output file, e.g.: lambda w, input: os.path.splitext(input[0])[0]
 28      Also see:
 29      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules
 30      https://snakemake.readthedocs.io/en/stable/tutorial/advanced.html#tutorial-input-functions
 31
 32Lints for rule process_silva (line 78, /tmp/tmp2l12itn8/workflow/Snakefile):
 33    * Param workdir is a prefix of input or output file but hardcoded:
 34      If this is meant to represent a file path prefix, it will fail when
 35      running workflow in environments without a shared filesystem. Instead,
 36      provide a function that infers the appropriate prefix from the input or
 37      output file, e.g.: lambda w, input: os.path.splitext(input[0])[0]
 38      Also see:
 39      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules
 40      https://snakemake.readthedocs.io/en/stable/tutorial/advanced.html#tutorial-input-functions
 41
 42Lints for rule download_rdp (line 141, /tmp/tmp2l12itn8/workflow/Snakefile):
 43    * No log directive defined:
 44      Without a log directive, all output will be printed to the terminal. In
 45      distributed environments, this means that errors are harder to discover.
 46      In local environments, output of concurrent jobs will be mixed and become
 47      unreadable.
 48      Also see:
 49      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
 50    * Specify a conda environment or container for each rule.:
 51      This way, the used software for each specific step is documented, and the
 52      workflow can be executed on any machine without prerequisites.
 53      Also see:
 54      https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#integrated-package-management
 55      https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#running-jobs-in-containers
 56    * Param outdir is a prefix of input or output file but hardcoded:
 57      If this is meant to represent a file path prefix, it will fail when
 58      running workflow in environments without a shared filesystem. Instead,
 59      provide a function that infers the appropriate prefix from the input or
 60      output file, e.g.: lambda w, input: os.path.splitext(input[0])[0]
 61      Also see:
 62      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules
 63      https://snakemake.readthedocs.io/en/stable/tutorial/advanced.html#tutorial-input-functions
 64
 65Lints for rule download_demo_data (line 182, /tmp/tmp2l12itn8/workflow/Snakefile):
 66    * No log directive defined:
 67      Without a log directive, all output will be printed to the terminal. In
 68      distributed environments, this means that errors are harder to discover.
 69      In local environments, output of concurrent jobs will be mixed and become
 70      unreadable.
 71      Also see:
 72      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
 73    * Specify a conda environment or container for each rule.:
 74      This way, the used software for each specific step is documented, and the
 75      workflow can be executed on any machine without prerequisites.
 76      Also see:
 77      https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#integrated-package-management
 78      https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#running-jobs-in-containers
 79
 80Lints for rule download_sra_data (line 212, /tmp/tmp2l12itn8/workflow/Snakefile):
 81    * No log directive defined:
 82      Without a log directive, all output will be printed to the terminal. In
 83      distributed environments, this means that errors are harder to discover.
 84      In local environments, output of concurrent jobs will be mixed and become
 85      unreadable.
 86      Also see:
 87      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
 88    * Specify a conda environment or container for each rule.:
 89      This way, the used software for each specific step is documented, and the
 90      workflow can be executed on any machine without prerequisites.
 91      Also see:
 92      https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#integrated-package-management
 93      https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#running-jobs-in-containers
 94    * Param outdir is a prefix of input or output file but hardcoded:
 95      If this is meant to represent a file path prefix, it will fail when
 96      running workflow in environments without a shared filesystem. Instead,
 97      provide a function that infers the appropriate prefix from the input or
 98      output file, e.g.: lambda w, input: os.path.splitext(input[0])[0]
 99      Also see:
100      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules
101      https://snakemake.readthedocs.io/en/stable/tutorial/advanced.html#tutorial-input-functions
102
103Lints for rule process_data (line 260, /tmp/tmp2l12itn8/workflow/Snakefile):
104    * Param workdir is a prefix of input or output file but hardcoded:
105      If this is meant to represent a file path prefix, it will fail when
106      running workflow in environments without a shared filesystem. Instead,
107      provide a function that infers the appropriate prefix from the input or
108      output file, e.g.: lambda w, input: os.path.splitext(input[0])[0]
109      Also see:
110      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules
111      https://snakemake.readthedocs.io/en/stable/tutorial/advanced.html#tutorial-input-functions
112
113Lints for rule calc_dists (line 365, /tmp/tmp2l12itn8/workflow/Snakefile):
114    * Param outdir is a prefix of input or output file but hardcoded:
115      If this is meant to represent a file path prefix, it will fail when
116      running workflow in environments without a shared filesystem. Instead,
117      provide a function that infers the appropriate prefix from the input or
118      output file, e.g.: lambda w, input: os.path.splitext(input[0])[0]
119      Also see:
120      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules
121      https://snakemake.readthedocs.io/en/stable/tutorial/advanced.html#tutorial-input-functions
122
123Lints for rule cluster_OTUs (line 412, /tmp/tmp2l12itn8/workflow/Snakefile):
124    * Param outdir is a prefix of input or output file but hardcoded:
125      If this is meant to represent a file path prefix, it will fail when
126      running workflow in environments without a shared filesystem. Instead,
127      provide a function that infers the appropriate prefix from the input or
128      output file, e.g.: lambda w, input: os.path.splitext(input[0])[0]
129      Also see:
130      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules
131      https://snakemake.readthedocs.io/en/stable/tutorial/advanced.html#tutorial-input-functions
132
133Lints for rule get_shared (line 472, /tmp/tmp2l12itn8/workflow/Snakefile):
134    * Param outdir is a prefix of input or output file but hardcoded:
135      If this is meant to represent a file path prefix, it will fail when
136      running workflow in environments without a shared filesystem. Instead,
137      provide a function that infers the appropriate prefix from the input or
138      output file, e.g.: lambda w, input: os.path.splitext(input[0])[0]
139      Also see:
140      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules
141      https://snakemake.readthedocs.io/en/stable/tutorial/advanced.html#tutorial-input-functions
142
143Lints for rule calc_diversity (line 524, /tmp/tmp2l12itn8/workflow/Snakefile):
144    * Param outdir is a prefix of input or output file but hardcoded:
145      If this is meant to represent a file path prefix, it will fail when
146      running workflow in environments without a shared filesystem. Instead,
147      provide a function that infers the appropriate prefix from the input or
148      output file, e.g.: lambda w, input: os.path.splitext(input[0])[0]
149      Also see:
150      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules
151      https://snakemake.readthedocs.io/en/stable/tutorial/advanced.html#tutorial-input-functions

Formatting results

1[DEBUG] 
2[DEBUG] In file "/tmp/tmp2l12itn8/workflow/Snakefile":  Formatted content is different from original
3[INFO] 1 file(s) would be changed 😬
4
5snakefmt version: 0.8.1