jsquaredosquared/novocraft-sv-benchmarking
Comparing performance of SV callers when using NovoAlign vs other aligners.
Overview
Topics: sv-calling novoalign
Latest release: None, Last update: 2024-11-29
Linting: linting: failed, Formatting: formatting: failed
Deployment
Step 1: Install Snakemake and Snakedeploy
Snakemake and Snakedeploy are best installed via the Mamba package manager (a drop-in replacement for conda). If you have neither Conda nor Mamba, it is recommended to install Miniforge. More details regarding Mamba can be found here.
When using Mamba, run
mamba create -c conda-forge -c bioconda --name snakemake snakemake snakedeploy
to install both Snakemake and Snakedeploy in an isolated environment. For all following commands ensure that this environment is activated via
conda activate snakemake
Step 2: Deploy workflow
With Snakemake and Snakedeploy installed, the workflow can be deployed as follows. First, create an appropriate project working directory on your system and enter it:
mkdir -p path/to/project-workdir
cd path/to/project-workdir
In all following steps, we will assume that you are inside of that directory. Then run
snakedeploy deploy-workflow https://github.com/jsquaredosquared/novocraft-sv-benchmarking . --tag None
Snakedeploy will create two folders, workflow
and config
. The former contains the deployment of the chosen workflow as a Snakemake module, the latter contains configuration files which will be modified in the next step in order to configure the workflow to your needs.
Step 3: Configure workflow
To configure the workflow, adapt config/config.yml
to your needs following the instructions below.
Step 4: Run workflow
The deployment method is controlled using the --software-deployment-method
(short --sdm
) argument.
To run the workflow with automatic deployment of all required software via conda
/mamba
, use
snakemake --cores all --sdm conda
Snakemake will automatically detect the main Snakefile
in the workflow
subfolder and execute the workflow module that has been defined by the deployment in step 2.
For further options such as cluster and cloud execution, see the docs.
Step 5: Generate report
After finalizing your data analysis, you can automatically generate an interactive visual HTML report for inspection of results together with parameters and code inside of the browser using
snakemake --report report.zip
Configuration
The following section is imported from the workflow’s config/README.md
.
Please see the "Usage" section of the main README.md file for more information on configuring the workflow.
Linting and formatting
Linting results
1Lints for snakefile /tmp/tmp5ctl5twu/workflow/rules/01_align.smk:
2 * Mixed rules and functions in same snakefile.:
3 Small one-liner functions used only once should be defined as lambda
4 expressions. Other functions should be collected in a common module, e.g.
5 'rules/common.smk'. This makes the workflow steps more readable.
6 Also see:
7 https://snakemake.readthedocs.io/en/latest/snakefiles/modularization.html#includes
8
9Lints for snakefile /tmp/tmp5ctl5twu/workflow/rules/03_benchmark.smk:
10 * Mixed rules and functions in same snakefile.:
11 Small one-liner functions used only once should be defined as lambda
12 expressions. Other functions should be collected in a common module, e.g.
13 'rules/common.smk'. This makes the workflow steps more readable.
14 Also see:
15 https://snakemake.readthedocs.io/en/latest/snakefiles/modularization.html#includes
16
17Lints for rule download_reference (line 1, /tmp/tmp5ctl5twu/workflow/rules/00_prepare.smk):
18 * Specify a conda environment or container for each rule.:
19 This way, the used software for each specific step is documented, and the
20 workflow can be executed on any machine without prerequisites.
21 Also see:
22 https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#integrated-package-management
23 https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#running-jobs-in-containers
24
25Lints for rule novoindex_reference (line 12, /tmp/tmp5ctl5twu/workflow/rules/00_prepare.smk):
26 * Specify a conda environment or container for each rule.:
27 This way, the used software for each specific step is documented, and the
28 workflow can be executed on any machine without prerequisites.
29 Also see:
30 https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#integrated-package-management
31 https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#running-jobs-in-containers
32 * Shell command directly uses variable config from outside of the rule:
33 It is recommended to pass all files as input and output, and non-file
34 parameters via the params directive. Otherwise, provenance tracking is
35 less accurate.
36 Also see:
37 https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules
38
39Lints for rule download_hg002_fastqs (line 56, /tmp/tmp5ctl5twu/workflow/rules/00_prepare.smk):
40 * Do not access input and output files individually by index in shell commands:
41 When individual access to input or output files is needed (i.e., just
42 writing '{input}' is impossible), use names ('{input.somename}') instead
43 of index based access.
44 Also see:
45 https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#rules
46 * Specify a conda environment or container for each rule.:
47 This way, the used software for each specific step is documented, and the
48 workflow can be executed on any machine without prerequisites.
49 Also see:
50 https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#integrated-package-management
51 https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#running-jobs-in-containers
52
53Lints for rule download_hg002_tier1_sv_truth_set (line 68, /tmp/tmp5ctl5twu/workflow/rules/00_prepare.smk):
54 * Do not access input and output files individually by index in shell commands:
55 When individual access to input or output files is needed (i.e., just
56 writing '{input}' is impossible), use names ('{input.somename}') instead
57 of index based access.
58 Also see:
59 https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#rules
60 * Specify a conda environment or container for each rule.:
61 This way, the used software for each specific step is documented, and the
62 workflow can be executed on any machine without prerequisites.
63 Also see:
64 https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#integrated-package-management
65 https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#running-jobs-in-containers
66
67Lints for rule download_delly_exclude (line 85, /tmp/tmp5ctl5twu/workflow/rules/00_prepare.smk):
68 * Specify a conda environment or container for each rule.:
69 This way, the used software for each specific step is documented, and the
70 workflow can be executed on any machine without prerequisites.
71 Also see:
72 https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#integrated-package-management
73 https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#running-jobs-in-containers
74
75Lints for rule align_with_novoalign (line 40, /tmp/tmp5ctl5twu/workflow/rules/01_align.smk):
76 * Shell command directly uses variable config from outside of the rule:
77 It is recommended to pass all files as input and output, and non-file
78 parameters via the params directive. Otherwise, provenance tracking is
79 less accurate.
80 Also see:
81 https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules
82
83Lints for rule configure_manta (line 1, /tmp/tmp5ctl5twu/workflow/rules/02_call.smk):
84 * Do not access input and output files individually by index in shell commands:
85 When individual access to input or output files is needed (i.e., just
86 writing '{input}' is impossible), use names ('{input.somename}') instead
87 of index based access.
88 Also see:
89 https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#rules
90 * Shell command directly uses variable config from outside of the rule:
91 It is recommended to pass all files as input and output, and non-file
92 parameters via the params directive. Otherwise, provenance tracking is
93 less accurate.
94 Also see:
95 https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules
96
97Lints for rule run_dysgu (line 38, /tmp/tmp5ctl5twu/workflow/rules/02_call.smk):
98 * Do not access input and output files individually by index in shell commands:
99 When individual access to input or output files is needed (i.e., just
100 writing '{input}' is impossible), use names ('{input.somename}') instead
101 of index based access.
102 Also see:
103 https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#rules
104 * Shell command directly uses variable config from outside of the rule:
105 It is recommended to pass all files as input and output, and non-file
106 parameters via the params directive. Otherwise, provenance tracking is
107 less accurate.
108 Also see:
109 https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules
110
111Lints for rule run_delly (line 60, /tmp/tmp5ctl5twu/workflow/rules/02_call.smk):
112 * Do not access input and output files individually by index in shell commands:
113 When individual access to input or output files is needed (i.e., just
114 writing '{input}' is impossible), use names ('{input.somename}') instead
115 of index based access.
116 Also see:
117 https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#rules
118 * Shell command directly uses variable config from outside of the rule:
119 It is recommended to pass all files as input and output, and non-file
120 parameters via the params directive. Otherwise, provenance tracking is
121 less accurate.
122 Also see:
123 https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules
124
125Lints for rule run_smoove (line 79, /tmp/tmp5ctl5twu/workflow/rules/02_call.smk):
126 * Do not access input and output files individually by index in shell commands:
127 When individual access to input or output files is needed (i.e., just
128 writing '{input}' is impossible), use names ('{input.somename}') instead
129 of index based access.
130 Also see:
131 https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#rules
132 * Shell command directly uses variable config from outside of the rule:
133 It is recommended to pass all files as input and output, and non-file
134 parameters via the params directive. Otherwise, provenance tracking is
135 less accurate.
136 Also see:
137 https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules
138
139Lints for rule run_tiddit (line 107, /tmp/tmp5ctl5twu/workflow/rules/02_call.smk):
140 * Do not access input and output files individually by index in shell commands:
141 When individual access to input or output files is needed (i.e., just
142 writing '{input}' is impossible), use names ('{input.somename}') instead
143 of index based access.
144 Also see:
145 https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#rules
146 * Shell command directly uses variable config from outside of the rule:
147 It is recommended to pass all files as input and output, and non-file
148 parameters via the params directive. Otherwise, provenance tracking is
149 less accurate.
150 Also see:
151 https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules
152
153Lints for rule run_insurveyor (line 125, /tmp/tmp5ctl5twu/workflow/rules/02_call.smk):
154 * Do not access input and output files individually by index in shell commands:
155 When individual access to input or output files is needed (i.e., just
156 writing '{input}' is impossible), use names ('{input.somename}') instead
157 of index based access.
158 Also see:
159 https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#rules
160 * Shell command directly uses variable config from outside of the rule:
161 It is recommended to pass all files as input and output, and non-file
162 parameters via the params directive. Otherwise, provenance tracking is
163 less accurate.
164 Also see:
165 https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules
166 * Shell command directly uses variable config from outside of the rule:
167 It is recommended to pass all files as input and output, and non-file
168 parameters via the params directive. Otherwise, provenance tracking is
169 less accurate.
170 Also see:
171 https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules
172
173Lints for rule bgzip_and_index_sv_vcf (line 151, /tmp/tmp5ctl5twu/workflow/rules/02_call.smk):
174 * Do not access input and output files individually by index in shell commands:
175 When individual access to input or output files is needed (i.e., just
176 writing '{input}' is impossible), use names ('{input.somename}') instead
177 of index based access.
178 Also see:
179 https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#rules
180
181Lints for rule split_truth_set_by_svtype (line 9, /tmp/tmp5ctl5twu/workflow/rules/03_benchmark.smk):
182 * Do not access input and output files individually by index in shell commands:
183 When individual access to input or output files is needed (i.e., just
184 writing '{input}' is impossible), use names ('{input.somename}') instead
185 of index based access.
186 Also see:
187 https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#rules
188
189Lints for rule generate_plots (line 1, /tmp/tmp5ctl5twu/workflow/rules/04_compare.smk):
190 * No log directive defined:
191 Without a log directive, all output will be printed to the terminal. In
192 distributed environments, this means that errors are harder to discover.
193 In local environments, output of concurrent jobs will be mixed and become
194 unreadable.
195 Also see:
196 https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
Formatting results
1[DEBUG]
2[DEBUG] In file "/tmp/tmp5ctl5twu/workflow/rules/02_call.smk": Formatted content is different from original
3[DEBUG]
4[DEBUG] In file "/tmp/tmp5ctl5twu/workflow/rules/01_align.smk": Formatted content is different from original
5[DEBUG]
6[DEBUG]
7[DEBUG]
8[DEBUG]
9[DEBUG] In file "/tmp/tmp5ctl5twu/workflow/rules/03_benchmark.smk": Formatted content is different from original
10[INFO] 3 file(s) would be changed 😬
11[INFO] 3 file(s) would be left unchanged 🎉
12
13snakefmt version: 0.10.2