ltalignani/evo-shave
DNAseq pipeline
Overview
Latest release: None, Last update: 2026-06-13
Share link: https://snakemake.github.io/snakemake-workflow-catalog?wf=ltalignani/evo-shave
Quality control: linting: failed formatting: failed
Wrappers: bio/fastqc bio/gatk/combinegvcfs bio/multiqc bio/picard/markduplicates bio/samtools/index bio/trimmomatic/pe
Deployment
Step 1: Install Snakemake and Snakedeploy
Snakemake and Snakedeploy are best installed via the Conda package manager. It is recommended to install conda via Miniforge. Run
conda create -c conda-forge -c bioconda -c nodefaults --name snakemake snakemake snakedeploy
to install both Snakemake and Snakedeploy in an isolated environment. For all following commands ensure that this environment is activated via
conda activate snakemake
For other installation methods, refer to the Snakemake and Snakedeploy documentation.
Step 2: Deploy workflow
With Snakemake and Snakedeploy installed, the workflow can be deployed as follows. First, create an appropriate project working directory on your system and enter it:
mkdir -p path/to/project-workdir
cd path/to/project-workdir
In all following steps, we will assume that you are inside of that directory. Then run
snakedeploy deploy-workflow https://github.com/ltalignani/evo-shave . --tag None
Snakedeploy will create two folders, workflow and config. The former contains the deployment of the chosen workflow as a Snakemake module, the latter contains configuration files which will be modified in the next step in order to configure the workflow to your needs.
Step 3: Configure workflow
To configure the workflow, adapt config/config.yml to your needs following the instructions below.
Step 4: Run workflow
The deployment method is controlled using the --software-deployment-method (short --sdm) argument.
To run the workflow with automatic deployment of all required software via conda/mamba, use
snakemake --cores all --sdm conda
Snakemake will automatically detect the main Snakefile in the workflow subfolder and execute the workflow module that has been defined by the deployment in step 2.
For further options such as cluster and cloud execution, see the docs.
Step 5: Generate report
After finalizing your data analysis, you can automatically generate an interactive visual HTML report for inspection of results together with parameters and code inside of the browser using
snakemake --report report.zip
Configuration
The following section is imported from the workflow’s config/README.md.
Configuration
Required files
config/samples.tsv
One row per biological sample. The sample column must match the sample names
used in config/units.tsv.
Column |
Description |
|---|---|
|
Unique sample identifier (string, may contain letters, digits, hyphens, dots) |
Example:
sample
FCV003
MPLS001
109
config/units.tsv
One row per sequencing unit (one sample may span multiple lanes / units). BAMs from multiple units are merged automatically before downstream processing.
Column |
Description |
|---|---|
|
Sample identifier — must match |
|
Lane or unit identifier (e.g. |
|
Sequencing platform (e.g. |
|
Path to R1 FASTQ file (gzip-compressed) |
|
Path to R2 FASTQ file (gzip-compressed) |
Example:
sample unit platform fq1 fq2
FCV003 L1 ILLUMINA raw/FCV003_L1_R1.fastq.gz raw/FCV003_L1_R2.fastq.gz
MPLS001 L1 ILLUMINA raw/MPLS001_L1_R1.fastq.gz raw/MPLS001_L1_R2.fastq.gz
Key parameters (config/config.yaml)
Reference genome
refs:
ref_name: "AalbF5"
reference: "resources/genomes/AalbF5.fasta" # path to FASTA
index: "resources/genomes/AalbF5.fasta.fai" # samtools fai index
dict: "resources/genomes/AalbF5.dict" # Picard sequence dictionary
BWA indices must be pre-built in resources/indexes/bwa/.
Variant caller
caller: "HaplotypeCaller" # or "UnifiedGenotyper"
HaplotypeCaller (GATK4): per-sample GVCF → joint genotyping. Recommended for most use cases.
UnifiedGenotyper (GATK3): multi-sample calling with indel realignment. Use to match MalariaGEN phase 2/3 parameters.
Chromosome / scaffold selection
chromosomes:
auto: true # read contigs from .fai at parse time (recommended)
min_size: 0 # exclude scaffolds smaller than N bp (0 = keep all)
pattern: "" # regex filter on contig names, e.g. "^NC_" (empty = keep all)
list: # used when auto: false
- "NC_085136.1"
vcf_output: "both" # "per_contig" | "merged" | "both"
MarkDuplicates
markdup:
skip: false # set to true for ddRAD-seq data
remove-duplicates: false
Set skip: true when processing ddRAD-seq libraries: enzymatic digestion
produces reads sharing the same start coordinates, which Picard would
incorrectly flag as PCR duplicates.
Hard filtering thresholds
filtering:
hard:
snvs: "QD < 2.0 || MQ < 40.0 || FS > 60.0 || SOR > 3.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0"
indels: "QD < 2.0 || FS > 200.0 || SOR > 10.0 || ReadPosRankSum < -20.0"
Variants failing any threshold are tagged FILTER in the output VCF.
Adjust thresholds based on your species and library characteristics.
Trimmomatic
trimmomatic:
adapters:
truseq2-pe: "resources/adapters/TruSeq2-PE.fa"
settings: "LEADING:20 TRAILING:3 SLIDINGWINDOW:5:20 AVGQUAL:20 MINLEN:50"
phred: "-phred33"
Results archiving (optional)
transfer:
results_dir: "/path/to/archive"
run_name: "evo-shave"
Used by transfer_results.sh to rsync outputs to a shared archive after the run.
Workflow parameters
The following table is automatically parsed from the workflow’s config.schema.y(a)ml file.
Parameter |
Type |
Description |
Required |
Default |
|---|---|---|---|---|
samples |
string |
Path to the samples file |
yes |
|
units |
string |
Path to the units file |
yes |
|
resources |
yes |
|||
. tmpdir |
string |
Temporary directory |
yes |
|
trimmomatic |
yes |
|||
. adapters |
yes |
|||
. . nextera |
string |
yes |
||
. . truseq2-pe |
string |
yes |
||
. . truseq2-se |
string |
yes |
||
. . truseq3-pe |
string |
yes |
||
. . truseq3-pe-2 |
string |
yes |
||
. . truseq3-se |
string |
yes |
||
. settings |
string |
yes |
||
. phred |
string |
yes |
||
refs |
yes |
|||
. ref_name |
string |
yes |
||
. path |
string |
yes |
||
. reference |
string |
yes |
||
. index |
string |
yes |
||
. dict |
string |
yes |
||
markdup |
yes |
|||
. remove-duplicates |
boolean |
yes |
||
caller |
string |
yes |
||
gatk |
yes |
|||
. haplotypecaller |
string |
yes |
||
. output_mode |
string |
yes |
||
. genomicsdbimport |
string |
yes |
||
. genotypegvcfs |
string |
yes |
||
chromosomes |
array |
yes |
||
filtering |
yes |
|||
. hard |
yes |
|||
. . snvs |
string |
yes |
||
. . indels |
string |
yes |
Linting and formatting
Linting results
1FileNotFoundError in file "/tmp/tmplakgbhkd/workflow/rules/common.smk", line 58:
2[Errno 2] No such file or directory: 'resources/genomes/GCA_018104305.1_AalbF3_genomic.fna.fai'
3 File "/tmp/tmplakgbhkd/workflow/rules/common.smk", line 85, in <module>
4 File "/tmp/tmplakgbhkd/workflow/rules/common.smk", line 58, in get_chromosomes
5 File "/home/runner/work/snakemake-workflow-catalog/snakemake-workflow-catalog/.pixi/envs/default/lib/python3.13/site-packages/pandas/io/parsers/readers.py", line 1405, in read_table
6 File "/home/runner/work/snakemake-workflow-catalog/snakemake-workflow-catalog/.pixi/envs/default/lib/python3.13/site-packages/pandas/io/parsers/readers.py", line 620, in _read
7 File "/home/runner/work/snakemake-workflow-catalog/snakemake-workflow-catalog/.pixi/envs/default/lib/python3.13/site-packages/pandas/io/parsers/readers.py", line 1620, in __init__
8 File "/home/runner/work/snakemake-workflow-catalog/snakemake-workflow-catalog/.pixi/envs/default/lib/python3.13/site-packages/pandas/io/parsers/readers.py", line 1880, in _make_engine
9 File "/home/runner/work/snakemake-workflow-catalog/snakemake-workflow-catalog/.pixi/envs/default/lib/python3.13/site-packages/pandas/io/common.py", line 873, in get_handle
Formatting results
1[DEBUG]
2[DEBUG]
3[DEBUG]
4[DEBUG]
5[DEBUG]
6[DEBUG]
7[DEBUG]
8[DEBUG]
9[DEBUG] In file "/tmp/tmplakgbhkd/workflow/Snakefile": Formatted content is different from original
10[DEBUG]
11[DEBUG] In file "/tmp/tmplakgbhkd/workflow/rules/bcftools_stats.smk": Formatted content is different from original
12[DEBUG]
13[DEBUG]
14[DEBUG]
15[DEBUG]
16[DEBUG]
17[DEBUG] In file "/tmp/tmplakgbhkd/workflow/rules/ug.smk": Formatted content is different from original
18[DEBUG]
19[DEBUG]
20[DEBUG]
21[DEBUG]
22[DEBUG]
23[DEBUG] In file "/tmp/tmplakgbhkd/workflow/rules/gtgvcfs.smk": Formatted content is different from original
24[DEBUG]
25[DEBUG] In file "/tmp/tmplakgbhkd/workflow/rules/hc.smk": Formatted content is different from original
26[DEBUG]
27[DEBUG]
28[DEBUG] In file "/tmp/tmplakgbhkd/workflow/rules/vcf_stats.smk": Formatted content is different from original
29[DEBUG]
30[DEBUG] In file "/tmp/tmplakgbhkd/workflow/rules/common.smk": Formatted content is different from original
31[DEBUG]
32[DEBUG]
33[DEBUG]
34[DEBUG]
35[DEBUG]
36[DEBUG] In file "/tmp/tmplakgbhkd/workflow/rules/fixmateinformation.smk": Formatted content is different from original
37[INFO] 8 file(s) would be changed 😬
38[INFO] 20 file(s) would be left unchanged 🎉
39
40snakefmt version: 0.11.5