merfre/Hull_Microbiome_Cluster_Workflow
Snakemake workflow for microbiome research cluster analysis using fastp, Kraken2, and BIOM.
Overview
Topics:
Latest release: v1.2, Last update: 2024-05-27
Linting: linting: failed, Formatting: formatting: failed
Deployment
Step 1: Install Snakemake and Snakedeploy
Snakemake and Snakedeploy are best installed via the Mamba package manager (a drop-in replacement for conda). If you have neither Conda nor Mamba, it is recommended to install Miniforge. More details regarding Mamba can be found here.
When using Mamba, run
mamba create -c conda-forge -c bioconda --name snakemake snakemake snakedeploy
to install both Snakemake and Snakedeploy in an isolated environment. For all following commands ensure that this environment is activated via
conda activate snakemake
Step 2: Deploy workflow
With Snakemake and Snakedeploy installed, the workflow can be deployed as follows. First, create an appropriate project working directory on your system and enter it:
mkdir -p path/to/project-workdir
cd path/to/project-workdir
In all following steps, we will assume that you are inside of that directory. Then run
snakedeploy deploy-workflow https://github.com/merfre/Hull_Microbiome_Cluster_Workflow . --tag v1.2
Snakedeploy will create two folders, workflow
and config
. The former contains the deployment of the chosen workflow as a Snakemake module, the latter contains configuration files which will be modified in the next step in order to configure the workflow to your needs.
Step 3: Configure workflow
To configure the workflow, adapt config/config.yml
to your needs following the instructions below.
Step 4: Run workflow
The deployment method is controlled using the --software-deployment-method
(short --sdm
) argument.
To run the workflow with automatic deployment of all required software via conda
/mamba
, use
snakemake --cores all --sdm conda
Snakemake will automatically detect the main Snakefile
in the workflow
subfolder and execute the workflow module that has been defined by the deployment in step 2.
For further options such as cluster and cloud execution, see the docs.
Step 5: Generate report
After finalizing your data analysis, you can automatically generate an interactive visual HTML report for inspection of results together with parameters and code inside of the browser using
snakemake --report report.zip
Configuration
The following section is imported from the workflow’s config/README.md
.
In this directory 'config' is the main configuration file 'config.yaml' which is used for specifying the samples to analyse and what parameters to use throughout the workflow.
This file is split into four sections:
-
The first is at the top of the file and contains general information for the workflow, including the location of the desired libraries, the metadata file for the samples, and the software environment to use for analysis.
-
The second section is titled 'Analysis options' and allows certain steps in analysis to be toggled on or off. Each analysis option is described below its title and includes which software is used for that step. When the word 'True' is next to the analysis option it will be included in the next workflow run. When 'False' is next to the option it will be excluded and the software described in that step will not be used for the next run.
-
The next section is below the second and labeled 'Database locations'. In this section the location of the reference databases for analysis is specified. For this workflow, there is one reference required, a Kraken2 database for taxonomy assignment.
-
The final section, located below the third, is titled 'Parameters'. This section contains the list of adjustable parameters used in this workflow. Each parameter has a title, a value assigned to it, and a description. The parameters are organized by the software that uses them. For instance, fastp is used for initial quality control and has six adjustable parameters for its performance.
Prior to using this workflow there is only one entry that must be specified before use and that is the path to the "metadata_file" for the samples you wish to analyze.
The metadata file supplied for this workflow should be placed into this config directory and in a table format that is tab delimited. Ideally this is a file you already have written to track your experiments and likely only requires changes to the column names for compatibility with this workflow. At a minimum this table needs to contain rows for each sample you wish to analyze and columns that specify:
-
The run it was sequenced in a column titled "Run", which should also be the name of the directory or library this sample is located in in the "resources" folder of this workflow.
-
The nanopore barcode of the sample in a column titled "Barcode", which is the barcode number assigned to this sample prior to sequencing and demultiplexed by Guppy.
-
The sample's ID in a column labeled "Sample_ID", which is a unique identifier that will be assigned to this sample's concatenated fastq file and all future results.
Linting and formatting
Linting results
1/tmp/tmpe2zczg64/merfre-Hull_Microbiome_Cluster_Workflow-dd5584e/workflow/Snakefile:37: SyntaxWarning: invalid escape sequence '\/'
2 RUNS = "[^\/]+"
3/tmp/tmpe2zczg64/merfre-Hull_Microbiome_Cluster_Workflow-dd5584e/workflow/Snakefile:38: SyntaxWarning: invalid escape sequence '\/'
4
5/tmp/tmpe2zczg64/merfre-Hull_Microbiome_Cluster_Workflow-dd5584e/workflow/Snakefile:39: SyntaxWarning: invalid escape sequence '\/'
6 ### Optional analyses ###
7/tmp/tmpe2zczg64/merfre-Hull_Microbiome_Cluster_Workflow-dd5584e/workflow/rules/biom_all_samples.smk:50: SyntaxWarning: invalid escape sequence '\#'
8/tmp/tmpe2zczg64/merfre-Hull_Microbiome_Cluster_Workflow-dd5584e/workflow/rules/biom_all_samples.smk:64: SyntaxWarning: invalid escape sequence '\#'
9/tmp/tmpe2zczg64/merfre-Hull_Microbiome_Cluster_Workflow-dd5584e/workflow/rules/biom_individual.smk:50: SyntaxWarning: invalid escape sequence '\#'
10/tmp/tmpe2zczg64/merfre-Hull_Microbiome_Cluster_Workflow-dd5584e/workflow/rules/biom_individual.smk:64: SyntaxWarning: invalid escape sequence '\#'
11Lints for snakefile /tmp/tmpe2zczg64/merfre-Hull_Microbiome_Cluster_Workflow-dd5584e/workflow/Snakefile:
12 * Path composition with '+' in line 25:
13 This becomes quickly unreadable. Usually, it is better to endure some
14 redundancy against having a more readable workflow. Hence, just repeat
15 common prefixes. If path composition is unavoidable, use pathlib or
16 (python >= 3.6) string formatting with f"...".
17 Also see:
18
19 * Path composition with '+' in line 16:
20 This becomes quickly unreadable. Usually, it is better to endure some
21 redundancy against having a more readable workflow. Hence, just repeat
22 common prefixes. If path composition is unavoidable, use pathlib or
23 (python >= 3.6) string formatting with f"...".
24 Also see:
25
26
27Lints for rule preqc_stats (line 9, /tmp/tmpe2zczg64/merfre-Hull_Microbiome_Cluster_Workflow-dd5584e/workflow/rules/fastp.smk):
28 * No log directive defined:
29 Without a log directive, all output will be printed to the terminal. In
30 distributed environments, this means that errors are harder to discover.
31 In local environments, output of concurrent jobs will be mixed and become
32 unreadable.
33 Also see:
34 https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
35 * Specify a conda environment or container for each rule.:
36 This way, the used software for each specific step is documented, and the
37 workflow can be executed on any machine without prerequisites.
38 Also see:
39 https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#integrated-package-management
40 https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#running-jobs-in-containers
41
42Lints for rule fastp (line 36, /tmp/tmpe2zczg64/merfre-Hull_Microbiome_Cluster_Workflow-dd5584e/workflow/rules/fastp.smk):
43 * No log directive defined:
44 Without a log directive, all output will be printed to the terminal. In
45 distributed environments, this means that errors are harder to discover.
46 In local environments, output of concurrent jobs will be mixed and become
47 unreadable.
48 Also see:
49 https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
50 * Specify a conda environment or container for each rule.:
51 This way, the used software for each specific step is documented, and the
52 workflow can be executed on any machine without prerequisites.
53 Also see:
54 https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#integrated-package-management
55 https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#running-jobs-in-containers
56
57Lints for rule postqc_stats (line 85, /tmp/tmpe2zczg64/merfre-Hull_Microbiome_Cluster_Workflow-dd5584e/workflow/rules/fastp.smk):
58 * No log directive defined:
59 Without a log directive, all output will be printed to the terminal. In
60 distributed environments, this means that errors are harder to discover.
61 In local environments, output of concurrent jobs will be mixed and become
62 unreadable.
63 Also see:
64 https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
65 * Specify a conda environment or container for each rule.:
66 This way, the used software for each specific step is documented, and the
67 workflow can be executed on any machine without prerequisites.
68 Also see:
69 https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#integrated-package-management
70 https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#running-jobs-in-containers
71
72Lints for rule kraken2 (line 9, /tmp/tmpe2zczg64/merfre-Hull_Microbiome_Cluster_Workflow-dd5584e/workflow/rules/kraken2.smk):
73 * No log directive defined:
74 Without a log directive, all output will be printed to the terminal. In
75 distributed environments, this means that errors are harder to discover.
76 In local environments, output of concurrent jobs will be mixed and become
77 unreadable.
78 Also see:
79 https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
80 * Specify a conda environment or container for each rule.:
81 This way, the used software for each specific step is documented, and the
82 workflow can be executed on any machine without prerequisites.
83 Also see:
84 https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#integrated-package-management
85 https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#running-jobs-in-containers
86 * Param rcf_input_directory is a prefix of input or output file but hardcoded:
87 If this is meant to represent a file path prefix, it will fail when
88 running workflow in environments without a shared filesystem. Instead,
89 provide a function that infers the appropriate prefix from the input or
90 output file, e.g.: lambda w, input: os.path.splitext(input[0])[0]
91 Also see:
92 https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules
93 https://snakemake.readthedocs.io/en/stable/tutorial/advanced.html#tutorial-input-functions
94
95Lints for rule kraken_to_biom (line 9, /tmp/tmpe2zczg64/merfre-Hull_Microbiome_Cluster_Workflow-dd5584e/workflow/rules/biom_all_samples.smk):
96 * No log directive defined:
97 Without a log directive, all output will be printed to the terminal. In
98 distributed environments, this means that errors are harder to discover.
99 In local environments, output of concurrent jobs will be mixed and become
100 unreadable.
101 Also see:
102 https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
103 * Specify a conda environment or container for each rule.:
104 This way, the used software for each specific step is documented, and the
105 workflow can be executed on any machine without prerequisites.
106 Also see:
107 https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#integrated-package-management
108 https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#running-jobs-in-containers
109
110Lints for rule biom_to_tsv (line 36, /tmp/tmpe2zczg64/merfre-Hull_Microbiome_Cluster_Workflow-dd5584e/workflow/rules/biom_all_samples.smk):
111 * No log directive defined:
112 Without a log directive, all output will be printed to the terminal. In
113 distributed environments, this means that errors are harder to discover.
114 In local environments, output of concurrent jobs will be mixed and become
115 unreadable.
116 Also see:
117 https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
118 * Specify a conda environment or container for each rule.:
119 This way, the used software for each specific step is documented, and the
120 workflow can be executed on any machine without prerequisites.
121 Also see:
122 https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#integrated-package-management
123 https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#running-jobs-in-containers
124
125Lints for rule kraken_to_biom_individual (line 9, /tmp/tmpe2zczg64/merfre-Hull_Microbiome_Cluster_Workflow-dd5584e/workflow/rules/biom_individual.smk):
126 * No log directive defined:
127 Without a log directive, all output will be printed to the terminal. In
128 distributed environments, this means that errors are harder to discover.
129 In local environments, output of concurrent jobs will be mixed and become
130 unreadable.
131 Also see:
132 https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
133 * Specify a conda environment or container for each rule.:
134 This way, the used software for each specific step is documented, and the
135 workflow can be executed on any machine without prerequisites.
136 Also see:
137 https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#integrated-package-management
138 https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#running-jobs-in-containers
139
140Lints for rule biom_to_tsv_individual (line 36, /tmp/tmpe2zczg64/merfre-Hull_Microbiome_Cluster_Workflow-dd5584e/workflow/rules/biom_individual.smk):
141 * No log directive defined:
142 Without a log directive, all output will be printed to the terminal. In
143 distributed environments, this means that errors are harder to discover.
144 In local environments, output of concurrent jobs will be mixed and become
145 unreadable.
146 Also see:
147 https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
148 * Specify a conda environment or container for each rule.:
149 This way, the used software for each specific step is documented, and the
150 workflow can be executed on any machine without prerequisites.
151 Also see:
152 https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#integrated-package-management
153 https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#running-jobs-in-containers
154
155Lints for rule fasta_conversion (line 9, /tmp/tmpe2zczg64/merfre-Hull_Microbiome_Cluster_Workflow-dd5584e/workflow/rules/metaflye.smk):
156 * No log directive defined:
157 Without a log directive, all output will be printed to the terminal. In
158 distributed environments, this means that errors are harder to discover.
159 In local environments, output of concurrent jobs will be mixed and become
160 unreadable.
161 Also see:
162 https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
163 * Specify a conda environment or container for each rule.:
164 This way, the used software for each specific step is documented, and the
165 workflow can be executed on any machine without prerequisites.
166 Also see:
167 https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#integrated-package-management
168 https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#running-jobs-in-containers
169
170Lints for rule metaflye (line 34, /tmp/tmpe2zczg64/merfre-Hull_Microbiome_Cluster_Workflow-dd5584e/workflow/rules/metaflye.smk):
171 * No log directive defined:
172 Without a log directive, all output will be printed to the terminal. In
173 distributed environments, this means that errors are harder to discover.
174 In local environments, output of concurrent jobs will be mixed and become
175 unreadable.
176 Also see:
177 https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
178 * Specify a conda environment or container for each rule.:
179 This way, the used software for each specific step is documented, and the
180 workflow can be executed on any machine without prerequisites.
181 Also see:
182 https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#integrated-package-management
183 https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#running-jobs-in-containers
184
185Lints for rule assembly_stat_report (line 76, /tmp/tmpe2zczg64/merfre-Hull_Microbiome_Cluster_Workflow-dd5584e/workflow/rules/metaflye.smk):
186 * No log directive defined:
187 Without a log directive, all output will be printed to the terminal. In
188 distributed environments, this means that errors are harder to discover.
189 In local environments, output of concurrent jobs will be mixed and become
190 unreadable.
191 Also see:
192 https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
193 * Specify a conda environment or container for each rule.:
194 This way, the used software for each specific step is documented, and the
195 workflow can be executed on any machine without prerequisites.
196 Also see:
197 https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#integrated-package-management
198 https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#running-jobs-in-containers
Formatting results
1[DEBUG]
2[WARNING] In file "/tmp/tmpe2zczg64/merfre-Hull_Microbiome_Cluster_Workflow-dd5584e/workflow/rules/biom_all_samples.smk": Keyword "shell" at line 14 has comments under a value.
3 PEP8 recommends block comments appear before what they describe
4(see https://www.python.org/dev/peps/pep-0008/#id30)
5<unknown>:1: SyntaxWarning: invalid escape sequence '\#'
6[DEBUG] In file "/tmp/tmpe2zczg64/merfre-Hull_Microbiome_Cluster_Workflow-dd5584e/workflow/rules/biom_all_samples.smk": Formatted content is different from original
7[DEBUG]
8[WARNING] In file "/tmp/tmpe2zczg64/merfre-Hull_Microbiome_Cluster_Workflow-dd5584e/workflow/rules/biom_individual.smk": Keyword "shell" at line 14 has comments under a value.
9 PEP8 recommends block comments appear before what they describe
10(see https://www.python.org/dev/peps/pep-0008/#id30)
11<unknown>:1: SyntaxWarning: invalid escape sequence '\#'
12[DEBUG] In file "/tmp/tmpe2zczg64/merfre-Hull_Microbiome_Cluster_Workflow-dd5584e/workflow/rules/biom_individual.smk": Formatted content is different from original
13[DEBUG]
14<unknown>:1: SyntaxWarning: invalid escape sequence '\/'
15<unknown>:1: SyntaxWarning: invalid escape sequence '\/'
16<unknown>:1: SyntaxWarning: invalid escape sequence '\/'
17[ERROR] In file "/tmp/tmpe2zczg64/merfre-Hull_Microbiome_Cluster_Workflow-dd5584e/workflow/Snakefile": InvalidPython: Black error:
Cannot parse: 78:0: else:
[DEBUG] In file "/tmp/tmpe2zczg64/merfre-Hull_Microbiome_Cluster_Workflow-dd5584e/workflow/Snakefile":
[WARNING] In file "/tmp/tmpe2zczg64/merfre-Hull_Microbiome_Cluster_Workflow-dd5584e/workflow/rules/fastp.smk": Keyword "shell" at line 14 has comments under a value.
PEP8 recommends block comments appear before what they describe
(see https://www.python.org/dev/peps/pep-0008/#id30)
[WARNING] In file "/tmp/tmpe2zczg64/merfre-Hull_Microbiome_Cluster_Workflow-dd5584e/workflow/rules/fastp.smk": Keyword "shell" at line 36 has comments under a value.
PEP8 recommends block comments appear before what they describe
(see https://www.python.org/dev/peps/pep-0008/#id30)
[WARNING] In file "/tmp/tmpe2zczg64/merfre-Hull_Microbiome_Cluster_Workflow-dd5584e/workflow/rules/fastp.smk": Keyword "shell" at line 59 has comments under a value.
PEP8 recommends block comments appear before what they describe
(see https://www.python.org/dev/peps/pep-0008/#id30)
[DEBUG] In file "/tmp/tmpe2zczg64/merfre-Hull_Microbiome_Cluster_Workflow-dd5584e/workflow/rules/fastp.smk": Formatted content is different from original
[DEBUG]
[WARNING] In file "/tmp/tmpe2zczg64/merfre-Hull_Microbiome_Cluster_Workflow-dd5584e/workflow/rules/metaflye.smk": Keyword "output" at line 26 has comments under a value.
PEP8 recommends block comments appear before what they describe
(see https://www.python.org/dev/peps/pep-0008/#id30)
[WARNING] In file "/tmp/tmpe2zczg64/merfre-Hull_Microbiome_Cluster_Workflow-dd5584e/workflow/rules/metaflye.smk": Keyword "shell" at line 31 has comments under a value.
PEP8 recommends block comments appear before what they describe
(see https://www.python.org/dev/peps/pep-0008/#id30)
[WARNING] In file "/tmp/tmpe2zczg64/merfre-Hull_Microbiome_Cluster_Workflow-dd5584e/workflow/rules/metaflye.smk": Keyword "shell" at line 49 has comments under a value.
PEP8 recommends block comments appear before what they describe
(see https://www.python.org/dev/peps/pep-0008/#id30)
[DEBUG] In file "/tmp/tmpe2zczg64/merfre-Hull_Microbiome_Cluster_Workflow-dd5584e/workflow/rules/metaflye.smk": Formatted content is different from original
[DEBUG]
[WARNING] In file "/tmp/tmpe2zczg64/merfre-Hull_Microbiome_Cluster_Workflow-dd5584e/workflow/rules/kraken2.smk": Keyword "shell" at line 19 has comments under a value.
PEP8 recommends block comments appear before what they describe
(see https://www.python.org/dev/peps/pep-0008/#id30)
[DEBUG] In file "/tmp/tmpe2zczg64/merfre-Hull_Microbiome_Cluster_Workflow-dd5584e/workflow/rules/kraken2.smk": Formatted content is different from original
[INFO] 1 file(s) raised parsing errors 🤕
[INFO] 5 file(s) would be changed 😬
snakefmt version: 0.10.2