PathoGenOmics-Lab/VIPERA
A Snakemake workflow for SARS-CoV-2 Viral Intra-Patient Evolution Reporting and Analysis
Overview
Latest release: v1.2.1, Last update: 2025-06-30
Linting: linting: failed, Formatting: formatting: failed
Topics: bioinformatics intrahost sars-cov-2 virus-evolution reporting snakemake
Deployment
Step 1: Install Snakemake and Snakedeploy
Snakemake and Snakedeploy are best installed via the Mamba package manager (a drop-in replacement for conda). If you have neither Conda nor Mamba, it is recommended to install Miniforge. More details regarding Mamba can be found here.
When using Mamba, run
mamba create -c conda-forge -c bioconda --name snakemake snakemake snakedeploy
to install both Snakemake and Snakedeploy in an isolated environment. For all following commands ensure that this environment is activated via
conda activate snakemake
Step 2: Deploy workflow
With Snakemake and Snakedeploy installed, the workflow can be deployed as follows. First, create an appropriate project working directory on your system and enter it:
mkdir -p path/to/project-workdir
cd path/to/project-workdir
In all following steps, we will assume that you are inside of that directory. Then run
snakedeploy deploy-workflow https://github.com/PathoGenOmics-Lab/VIPERA . --tag v1.2.1
Snakedeploy will create two folders, workflow
and config
. The former contains the deployment of the chosen workflow as a Snakemake module, the latter contains configuration files which will be modified in the next step in order to configure the workflow to your needs.
Step 3: Configure workflow
To configure the workflow, adapt config/config.yml
to your needs following the instructions below.
Step 4: Run workflow
The deployment method is controlled using the --software-deployment-method
(short --sdm
) argument.
To run the workflow with automatic deployment of all required software via conda
/mamba
, use
snakemake --cores all --sdm conda
Snakemake will automatically detect the main Snakefile
in the workflow
subfolder and execute the workflow module that has been defined by the deployment in step 2.
For further options such as cluster and cloud execution, see the docs.
Step 5: Generate report
After finalizing your data analysis, you can automatically generate an interactive visual HTML report for inspection of results together with parameters and code inside of the browser using
snakemake --report report.zip
Configuration
The following section is imported from the workflow’s config/README.md
.
Instructions
To run VIPERA, an environment with Snakemake version 7.19 or later is needed (see the Snakemake docs for setup instructions).
This guide provides command-line instructions for running VIPERA with Snakemake versions prior to 8. All configuration parameters are fully cross-compatible. The original publication used Snakemake 7.32, but newer versions can also be used with only minor changes. For details, see the Snakemake migration guide. For example, existing profiles are cross-compatible as well, but note that the
--use-conda
flag is deprecated starting with Snakemake 8. Instead, use--software-deployment-method conda
.
Inputs and outputs
The workflow requires a set of FASTA files (one per target sample), a corresponding set of
BAM files (also one per target sample), and a metadata table in CSV format with one row per
sample. The metadata must include the following columns: unique sample identifier (default column ID
,
used to match sequencing files with metadata), the date the sample was collected (default CollectionDate
),
the location where the sample was collected (default ResidenceCity
), and GISAID accession (default GISAIDEPI
).
The default column names but can be customized if needed via the workflow parameters.
These parameters are set in two configuration files in YAML format:
config.yaml (for general workflow settings) and
targets.yaml (for specific dataset-related settings).
The latter must be modified by the user to point the SAMPLES
and METADATA
parameters to your data. The OUTPUT_DIRECTORY
parameter should point to your
desired results directory.
The script build_targets.py
simplifies the process of creating
the targets configuration file. To run this script, you need to have PyYAML installed. It
takes a list of sample names, a directory with BAM and FASTA files, the path to
the metadata table and the name of your dataset as required inputs. Then, it searches the
directory for files that have the appropriate extensions and sample names and adds them
to the configuration file.
An example file could look like this:
OUTPUT_NAME:
"your-dataset-name"
SAMPLES:
sample1:
bam: "path/to/sorted/bam1.bam"
fasta: "path/to/sequence1.fasta"
sample2:
bam: "path/to/sorted/bam2.bam"
fasta: "path/to/sequence2.fasta"
...
METADATA:
"path/to/metadata.csv"
OUTPUT_DIRECTORY:
"output"
CONTEXT_FASTA:
null
MAPPING_REFERENCES_FASTA:
null
This information may also be provided through the --config
parameter.
Automated construction of a context dataset
Setting the CONTEXT_FASTA
parameter to null
(default) will enable
the automatic download of sequences from the GISAID SARS-CoV-2 database.
An unset parameter has the same effect.
To enable this, you must also sign up to the GISAID platform
and provide your credentials by creating and filling an additional configuration
file (default: config/gisaid.yaml
) as follows:
USERNAME: "your-username"
PASSWORD: "your-password"
A set of samples that meet the spatial, temporal and phylogenetic criteria
set through the download_context
rule
will be retrieved automatically from GISAID. These criteria are:
Location matching the place(s) of sampling of the target samples
Collection date within the time window that includes 95% of the date distribution of the target samples (2.5% is trimmed at each end to account for extreme values) ± 2 weeks
Pango lineage matching that of the target samples
Then, a series of checkpoint steps are executed for quality assurance:
Remove context samples whose GISAID ID match any of the target samples
Enforce a minimum number of samples to have at least as many possible combinations as random subsample replicates for the diversity assessment (set in config.yaml)
The workflow will continue its execution until completion if the obtained
context dataset passes these checkpoints. Otherwise, the execution will be
terminated and, to continue the analysis, an external context dataset must
be provided through the CONTEXT_FASTA
parameter. This can be done
by editing targets.yaml or via the command line:
snakemake --config CONTEXT_FASTA="path/to/fasta"
Mapping reference sequence
Setting MAPPING_REFERENCES_FASTA
to null
(default) will enable the automatic download of the
reference sequence(s) that were used to map the reads and generate the BAM files.
An unset parameter has the same effect.
If the required sequence is not available publically or the user already has it
at your disposal, it can be provided manually by setting the parameter to the
path of the reference FASTA file.
Workflow configuration variables
All of the following variables are pre-defined in config.yaml:
ALIGNMENT_REFERENCE
: NCBI accession number of the reference record for sequence alignment.PROBLEMATIC_VCF
: URL or path of a VCF file containing problematic genome positions for masking.FEATURES_JSON
: path of a JSON file containing name equivalences of genome features for data visualization.GENETIC_CODE_JSON
: path of a JSON file containing a genetic code for gene translation.TREE_MODEL
: substitution model used by IQTREE (see docs).UFBOOT_REPS
: ultrafast bootstrap replicates for IQTREE (see UFBoot).SHALRT_REPS
: Shimodaira–Hasegawa approximate likelihood ratio test bootstrap replicates for IQTREE (see SH-aLRT).VC
: variant calling configuration:MAX_DEPTH
: maximum depth at a position forsamtools mpileup
(option-d
).MIN_QUALITY
: minimum base quality forsamtools mpileup
(option-Q
).IVAR_QUALITY
: minimum base quality forivar variants
(option-q
).IVAR_FREQ
: minimum frequency threshold forivar variants
(option-t
).IVAR_DEPTH
: minimum read depth forivar variants
(option-m
).
DEMIX
: demixing configuration:MIN_QUALITY
: minimum quality forfreyja variants
(option--minq
).COV_CUTOFF
: minimum depth forfreyja demix
(option--covcut
).MIN_ABUNDANCE
: minimum lineage estimated abundance forfreyja demix
(option--eps
).
WINDOW
: sliding window of nucleotide variants per site configuration:WIDTH
: number of sites within windows.STEP
: number of sites between windows.
GISAID
: automatic context download configuration.CREDENTIALS
: path of the GISAID credentials in YAML format.DATE_COLUMN
: name of the column that contains sampling dates (YYYY-MM-DD) in the input target metadata.LOCATION_COLUMN
: name of the column that contains sampling locations (e.g. city names) in the input target metadata.ACCESSION_COLUMN
: name of the column that contains GISAID EPI identifiers in the input target metadata.
DIVERSITY_REPS
: number of random sample subsets of the context dataset for the nucleotide diversity comparison.USE_BIONJ
: use the BIONJ algorithm (Gascuel, 1997) instead of NJ (neighbor-joining; Saitou & Nei, 1987) to reconstruct phylogenetic trees from pairwise distances.LOG_PY_FMT
: logging format string for Python scripts.PLOTS
: path of the R script that sets the design and style of data visualizations.PLOT_GENOME_REGIONS
: path of a CSV file containing genome regions, e.g. SARS-CoV-2 non-structural protein (NSP) coordinates, for data visualization.REPORT_QMD
: path of the report template in Quarto markdown (QMD) format.
Workflow graphs
To generate a simplified rule graph, run:
snakemake --rulegraph | dot -Tpng > .rulegraph.png
To generate the directed acyclic graph (DAG) of all rules to be executed, run:
snakemake --forceall --dag | dot -Tpng > .dag.png
Run modes
To run the analysis with the default configuration, run the following command
(change the -c/--cores
argument to use a different number of CPUs):
snakemake --use-conda -c4
To run the analysis in an HPC environment using SLURM, we provide a default profile configuration as an example that should be modified to fit your needs. To use it, run the following command:
snakemake --use-conda --slurm --profile profile/default
Additionally, we offer the option of running the workflow within a containerized
environment using a pre-built Docker image,
provided that Singularity
is available on the system. This eliminates the need for further conda package
downloads and environment configuration.
To do that, simply add the option --use-singularity
to any of the previous commands.
Using Singularity for running VIPERA in the Windows Subsystem for Linux (WSL)
may encounter errors due to the default file permissions configuration, which
conflicts with Snakemake’s containerized conda environment activation mechanism.
Thus, running the containerized VIPERA workflow on the WSL is not advised.
Additionally, certain known issues arise when utilizing non-default temporary
directories and Snakemake shadow directories. To address this issue, use the
default temporary directory (e.g. export TMPDIR=/tmp
in Linux machines) and
specify the shadow prefix (--shadow-prefix /tmp
) before executing the containerized workflow.
Linting and formatting
Linting results
1Lints for snakefile /tmp/tmp0sz1c8ap/PathoGenOmics-Lab-VIPERA-e988fe8/workflow/rules/common.smk:
2 * Absolute path "/"sequences.fasta" in line 52:
3 Do not define absolute paths inside of the workflow, since this renders
4 your workflow irreproducible on other machines. Use path relative to the
5 working directory instead, or make the path configurable via a config
6 file.
7 Also see:
8 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
9
10Lints for snakefile /tmp/tmp0sz1c8ap/PathoGenOmics-Lab-VIPERA-e988fe8/workflow/rules/fasta.smk:
11 * Absolute path "/g" in line 28:
12 Do not define absolute paths inside of the workflow, since this renders
13 your workflow irreproducible on other machines. Use path relative to the
14 working directory instead, or make the path configurable via a config
15 file.
16 Also see:
17 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
18 * Absolute path "/f" in line 55:
19 Do not define absolute paths inside of the workflow, since this renders
20 your workflow irreproducible on other machines. Use path relative to the
21 working directory instead, or make the path configurable via a config
22 file.
23 Also see:
24 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
25 * Absolute path "/f" in line 70:
26 Do not define absolute paths inside of the workflow, since this renders
27 your workflow irreproducible on other machines. Use path relative to the
28 working directory instead, or make the path configurable via a config
29 file.
30 Also see:
31 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
32 * Absolute path "/f" in line 74:
33 Do not define absolute paths inside of the workflow, since this renders
34 your workflow irreproducible on other machines. Use path relative to the
35 working directory instead, or make the path configurable via a config
36 file.
37 Also see:
38 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
39
40Lints for snakefile /tmp/tmp0sz1c8ap/PathoGenOmics-Lab-VIPERA-e988fe8/workflow/rules/asr.smk:
41 * Absolute path "/f" in line 10:
42 Do not define absolute paths inside of the workflow, since this renders
43 your workflow irreproducible on other machines. Use path relative to the
44 working directory instead, or make the path configurable via a config
45 file.
46 Also see:
47 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
48 * Absolute path "/f" in line 13:
49 Do not define absolute paths inside of the workflow, since this renders
50 your workflow irreproducible on other machines. Use path relative to the
51 working directory instead, or make the path configurable via a config
52 file.
53 Also see:
54 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
55 * Absolute path "/f" in line 34:
56 Do not define absolute paths inside of the workflow, since this renders
57 your workflow irreproducible on other machines. Use path relative to the
58 working directory instead, or make the path configurable via a config
59 file.
60 Also see:
61 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
62
63Lints for snakefile /tmp/tmp0sz1c8ap/PathoGenOmics-Lab-VIPERA-e988fe8/workflow/rules/demix.smk:
64 * Absolute path "/"{sample}/{sample}_depth.txt" in line 11:
65 Do not define absolute paths inside of the workflow, since this renders
66 your workflow irreproducible on other machines. Use path relative to the
67 working directory instead, or make the path configurable via a config
68 file.
69 Also see:
70 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
71 * Absolute path "/"{sample}/{sample}_variants.tsv" in line 12:
72 Do not define absolute paths inside of the workflow, since this renders
73 your workflow irreproducible on other machines. Use path relative to the
74 working directory instead, or make the path configurable via a config
75 file.
76 Also see:
77 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
78 * Absolute path "/"{sample}/{sample}_depth.txt" in line 31:
79 Do not define absolute paths inside of the workflow, since this renders
80 your workflow irreproducible on other machines. Use path relative to the
81 working directory instead, or make the path configurable via a config
82 file.
83 Also see:
84 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
85 * Absolute path "/"{sample}/{sample}_variants.tsv" in line 32:
86 Do not define absolute paths inside of the workflow, since this renders
87 your workflow irreproducible on other machines. Use path relative to the
88 working directory instead, or make the path configurable via a config
89 file.
90 Also see:
91 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
92 * Absolute path "/"{sample}/{sample}_demixed.tsv" in line 37:
93 Do not define absolute paths inside of the workflow, since this renders
94 your workflow irreproducible on other machines. Use path relative to the
95 working directory instead, or make the path configurable via a config
96 file.
97 Also see:
98 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
99 * Absolute path "/"{sample}/{sample}_demixed.tsv" in line 56:
100 Do not define absolute paths inside of the workflow, since this renders
101 your workflow irreproducible on other machines. Use path relative to the
102 working directory instead, or make the path configurable via a config
103 file.
104 Also see:
105 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
106
107Lints for snakefile /tmp/tmp0sz1c8ap/PathoGenOmics-Lab-VIPERA-e988fe8/workflow/rules/vaf.smk:
108 * Absolute path "/g" in line 30:
109 Do not define absolute paths inside of the workflow, since this renders
110 your workflow irreproducible on other machines. Use path relative to the
111 working directory instead, or make the path configurable via a config
112 file.
113 Also see:
114 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
115 * Absolute path "/'{wildcards.sample}" in line 52:
116 Do not define absolute paths inside of the workflow, since this renders
117 your workflow irreproducible on other machines. Use path relative to the
118 working directory instead, or make the path configurable via a config
119 file.
120 Also see:
121 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
122
123Lints for snakefile /tmp/tmp0sz1c8ap/PathoGenOmics-Lab-VIPERA-e988fe8/workflow/rules/context.smk:
124 * Absolute path "/"sequences.fasta" in line 26:
125 Do not define absolute paths inside of the workflow, since this renders
126 your workflow irreproducible on other machines. Use path relative to the
127 working directory instead, or make the path configurable via a config
128 file.
129 Also see:
130 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
131 * Absolute path "/"metadata.csv" in line 27:
132 Do not define absolute paths inside of the workflow, since this renders
133 your workflow irreproducible on other machines. Use path relative to the
134 working directory instead, or make the path configurable via a config
135 file.
136 Also see:
137 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
138 * Absolute path "/"duplicate_accession_ids.txt" in line 28:
139 Do not define absolute paths inside of the workflow, since this renders
140 your workflow irreproducible on other machines. Use path relative to the
141 working directory instead, or make the path configurable via a config
142 file.
143 Also see:
144 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
145 * Absolute path "/"nextalign" in line 45:
146 Do not define absolute paths inside of the workflow, since this renders
147 your workflow irreproducible on other machines. Use path relative to the
148 working directory instead, or make the path configurable via a config
149 file.
150 Also see:
151 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
152 * Absolute path "/"nextalign" in line 46:
153 Do not define absolute paths inside of the workflow, since this renders
154 your workflow irreproducible on other machines. Use path relative to the
155 working directory instead, or make the path configurable via a config
156 file.
157 Also see:
158 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
159 * Absolute path "/"nextalign" in line 61:
160 Do not define absolute paths inside of the workflow, since this renders
161 your workflow irreproducible on other machines. Use path relative to the
162 working directory instead, or make the path configurable via a config
163 file.
164 Also see:
165 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
166 * Absolute path "/"nextalign" in line 65:
167 Do not define absolute paths inside of the workflow, since this renders
168 your workflow irreproducible on other machines. Use path relative to the
169 working directory instead, or make the path configurable via a config
170 file.
171 Also see:
172 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
173 * Absolute path "/f" in line 84:
174 Do not define absolute paths inside of the workflow, since this renders
175 your workflow irreproducible on other machines. Use path relative to the
176 working directory instead, or make the path configurable via a config
177 file.
178 Also see:
179 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
180 * Absolute path "/"nextalign" in line 85:
181 Do not define absolute paths inside of the workflow, since this renders
182 your workflow irreproducible on other machines. Use path relative to the
183 working directory instead, or make the path configurable via a config
184 file.
185 Also see:
186 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
187 * Absolute path "/^>/{{p=seen[$0]++}}!p" in line 96:
188 Do not define absolute paths inside of the workflow, since this renders
189 your workflow irreproducible on other machines. Use path relative to the
190 working directory instead, or make the path configurable via a config
191 file.
192 Also see:
193 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
194
195Lints for snakefile /tmp/tmp0sz1c8ap/PathoGenOmics-Lab-VIPERA-e988fe8/workflow/rules/report.smk:
196 * Absolute path "/f" in line 40:
197 Do not define absolute paths inside of the workflow, since this renders
198 your workflow irreproducible on other machines. Use path relative to the
199 working directory instead, or make the path configurable via a config
200 file.
201
202... (truncated)
Formatting results
1[DEBUG]
2[DEBUG] In file "/tmp/tmp0sz1c8ap/PathoGenOmics-Lab-VIPERA-e988fe8/workflow/rules/demix.smk": Formatted content is different from original
3[DEBUG]
4[DEBUG] In file "/tmp/tmp0sz1c8ap/PathoGenOmics-Lab-VIPERA-e988fe8/workflow/rules/report.smk": Formatted content is different from original
5[DEBUG]
6[DEBUG] In file "/tmp/tmp0sz1c8ap/PathoGenOmics-Lab-VIPERA-e988fe8/workflow/rules/context.smk": Formatted content is different from original
7[DEBUG]
8[DEBUG] In file "/tmp/tmp0sz1c8ap/PathoGenOmics-Lab-VIPERA-e988fe8/workflow/rules/fetch.smk": Formatted content is different from original
9[DEBUG]
10[DEBUG] In file "/tmp/tmp0sz1c8ap/PathoGenOmics-Lab-VIPERA-e988fe8/workflow/rules/distances.smk": Formatted content is different from original
11[DEBUG]
12[DEBUG] In file "/tmp/tmp0sz1c8ap/PathoGenOmics-Lab-VIPERA-e988fe8/workflow/rules/fasta.smk": Formatted content is different from original
13[DEBUG]
14[DEBUG] In file "/tmp/tmp0sz1c8ap/PathoGenOmics-Lab-VIPERA-e988fe8/workflow/rules/vaf.smk": Formatted content is different from original
15[DEBUG]
16[DEBUG] In file "/tmp/tmp0sz1c8ap/PathoGenOmics-Lab-VIPERA-e988fe8/workflow/Snakefile": Formatted content is different from original
17[DEBUG]
18[DEBUG] In file "/tmp/tmp0sz1c8ap/PathoGenOmics-Lab-VIPERA-e988fe8/workflow/rules/evolution.smk": Formatted content is different from original
19[DEBUG]
20[DEBUG] In file "/tmp/tmp0sz1c8ap/PathoGenOmics-Lab-VIPERA-e988fe8/workflow/rules/common.smk": Formatted content is different from original
21[DEBUG]
22[DEBUG] In file "/tmp/tmp0sz1c8ap/PathoGenOmics-Lab-VIPERA-e988fe8/workflow/rules/pangolin.smk": Formatted content is different from original
23[DEBUG]
24[DEBUG] In file "/tmp/tmp0sz1c8ap/PathoGenOmics-Lab-VIPERA-e988fe8/workflow/rules/asr.smk": Formatted content is different from original
25[INFO] 12 file(s) would be changed 😬
26
27snakefmt version: 0.11.0