snakemake-workflows/read-alignment-pangenome
Standardized snakemake workflow for aligning sequencing reads to a pangenome.
Overview
Latest release: None, Last update: 2026-03-10
Share link: https://snakemake.github.io/snakemake-workflow-catalog?wf=snakemake-workflows/read-alignment-pangenome
Quality control: linting: passed formatting: passed
Wrappers: bio/bwa/index bio/bwa/mem bio/fastp bio/gatk/applybqsr bio/gatk/baserecalibratorspark bio/picard/markduplicates bio/reference/ensembl-sequence bio/reference/ensembl-variation bio/samtools/faidx bio/samtools/fixmate bio/samtools/index bio/samtools/merge bio/samtools/sort bio/samtools/view bio/sra-tools/fasterq-dump bio/tabix/index bio/vg/giraffe
Deployment
Step 1: Install Snakemake and Snakedeploy
Snakemake and Snakedeploy are best installed via the Conda package manager. It is recommended to install conda via Miniforge. Run
conda create -c conda-forge -c bioconda -c nodefaults --name snakemake snakemake snakedeploy
to install both Snakemake and Snakedeploy in an isolated environment. For all following commands ensure that this environment is activated via
conda activate snakemake
For other installation methods, refer to the Snakemake and Snakedeploy documentation.
Step 2: Deploy workflow
With Snakemake and Snakedeploy installed, the workflow can be deployed as follows. First, create an appropriate project working directory on your system and enter it:
mkdir -p path/to/project-workdir
cd path/to/project-workdir
In all following steps, we will assume that you are inside of that directory. Then run
snakedeploy deploy-workflow https://github.com/snakemake-workflows/read-alignment-pangenome . --tag None
Snakedeploy will create two folders, workflow and config. The former contains the deployment of the chosen workflow as a Snakemake module, the latter contains configuration files which will be modified in the next step in order to configure the workflow to your needs.
Step 3: Configure workflow
To configure the workflow, adapt config/config.yml to your needs following the instructions below.
Step 4: Run workflow
The deployment method is controlled using the --software-deployment-method (short --sdm) argument.
To run the workflow using apptainer/singularity, use
snakemake --cores all --sdm apptainer
To run the workflow using a combination of conda and apptainer/singularity for software deployment, use
snakemake --cores all --sdm conda apptainer
To run the workflow with automatic deployment of all required software via conda/mamba, use
snakemake --cores all --sdm conda
Snakemake will automatically detect the main Snakefile in the workflow subfolder and execute the workflow module that has been defined by the deployment in step 2.
For further options such as cluster and cloud execution, see the docs.
Step 5: Generate report
After finalizing your data analysis, you can automatically generate an interactive visual HTML report for inspection of results together with parameters and code inside of the browser using
snakemake --report report.zip
Configuration
The following section is imported from the workflow’s config/README.md.
General settings
To configure this workflow, modify config/config.yaml according to your needs, following the explanations provided in the file.
This workflow is derived from snakemake-workflows/dna-seq-varlociraptor and focuses on:
reference + optional pangenome resource preparation
read preprocessing/merging
read mapping, including optional pangenome-graph alignment with
vg giraffealignment postprocessing to produce final BAM + BAI
Variant calling, annotation, filtering, and reporting are intentionally out of scope.
Sample sheet
Add samples to config/samples.tsv. For each sample, the columns sample_name, platform, group, and datatype must be defined.
Samples within the same
groupcan be treated as belonging together for aggregation logic that is retained from upstream.The
platformcolumn needs to contain the used sequencing platform (one ofCAPILLARY,LS454,ILLUMINA,SOLID,HELICOS,IONTORRENT,ONT,PACBIO). This is required because the workflow adds read groups during alignment postprocessing.The
datatypecolumn is used by upstream-derived helper logic to determine the alignment branch and related processing.Optionally, a
panelcolumn can be provided. This is only relevant if primer trimming is enabled panel-wise (see Primer trimming); the value links a sample to a primer panel definition.Optionally, the columns
umi_readandumi_lencan be provided to enable UMI annotation (see Annotating UMIs).umi_readcan befq1,fq2, orboth.umi_lenis the number of bases (UMI length) to be annotated as UMI.
Missing values can be specified by empty columns or by writing NA. Lines can be commented out with #.
Unit sheet
For each sample, add one or more sequencing units (runs, lanes, or replicates) to the unit sheet config/units.tsv.
Each unit has a
unit_name(lane/run/replicate ID).Each unit has a
sample_name, which associates it with the biological sample it comes from. This information is used to merge all units of a sample before read mapping.
For each unit, you need to specify one of these input modes:
fq1only for single-end reads (path to a FASTQ file)fq1andfq2for paired-end reads (paths to FASTQ files)sraonly: specify an SRA accession (for example,SRR...). The workflow will download paired-end reads from SRA.
If both local files (fq1, fq2) and an SRA accession (sra) are available, the local files will be used.
Adapters / trimming behavior (fastp)
Adapters can be configured in the adapters column by putting fastp arguments in quotation marks
(for example, "--adapter_sequence ACGC... --adapter_sequence_r2 GCTA...").
Automatic adapter trimming can be enabled by setting the keyword auto_trim.
If the adapters column is empty or NA for any unit of a sample, fastp will not be used for that sample and raw reads will be merged directly.
Missing values can be specified by empty columns or by writing NA. Lines can be commented out with #.
Primer trimming
Primer trimming is retained from upstream logic. Trimming will be applied if global primer sequences are provided in config/config.yaml or primer panels are set in the sample sheet (column panel).
Primers can be defined either:
directly in
config/config.yaml(primers.trimming.primers_fa1/primers.trimming.primers_fa2), orvia a separate TSV file (
primers.trimming.tsv) with columns:panel,fa1,fa2(optional).
If a panel is not provided for a sample, primer trimming will not be performed for that sample.
For single-primer trimming only, only the first entry (fa1) needs to be defined.
Annotating UMIs
UMI annotation is retained from upstream logic.
To enable it, add the following columns to config/samples.tsv:
umi_read: where the UMI is locatedfq1if the UMI is part of read 1fq2if the UMI is part of read 2bothif there are UMIs in both paired-end reads
umi_len: number of bases (UMI length) to be annotated as UMI.
Linting and formatting
Linting results
All tests passed!
Formatting results
All tests passed!