vollgerlab/DSA-phasing
None
Overview
Latest release: None, Last update: 2026-02-18
Linting: linting: failed, Formatting: formatting: passed
Deployment
Step 1: Install Snakemake and Snakedeploy
Snakemake and Snakedeploy are best installed via the Conda. It is recommended to install conda via Miniforge. Run
conda create -c conda-forge -c bioconda -c nodefaults --name snakemake snakemake snakedeploy
to install both Snakemake and Snakedeploy in an isolated environment. For all following commands ensure that this environment is activated via
conda activate snakemake
For other installation methods, refer to the Snakemake and Snakedeploy documentation.
Step 2: Deploy workflow
With Snakemake and Snakedeploy installed, the workflow can be deployed as follows. First, create an appropriate project working directory on your system and enter it:
mkdir -p path/to/project-workdir
cd path/to/project-workdir
In all following steps, we will assume that you are inside of that directory. Then run
snakedeploy deploy-workflow https://github.com/vollgerlab/DSA-phasing . --tag None
Snakedeploy will create two folders, workflow and config. The former contains the deployment of the chosen workflow as a Snakemake module, the latter contains configuration files which will be modified in the next step in order to configure the workflow to your needs.
Step 3: Configure workflow
To configure the workflow, adapt config/config.yml to your needs following the instructions below.
Step 4: Run workflow
The deployment method is controlled using the --software-deployment-method (short --sdm) argument.
To run the workflow with automatic deployment of all required software via conda/mamba, use
snakemake --cores all --sdm conda
Snakemake will automatically detect the main Snakefile in the workflow subfolder and execute the workflow module that has been defined by the deployment in step 2.
For further options such as cluster and cloud execution, see the docs.
Step 5: Generate report
After finalizing your data analysis, you can automatically generate an interactive visual HTML report for inspection of results together with parameters and code inside of the browser using
snakemake --report report.zip
Configuration
The following section is imported from the workflow’s config/README.md.
DSA-phasing Configuration
This document describes all configuration options available for the DSA-phasing workflow. Configuration is primarily done through config.yaml and the workflow validates all options against workflow/config.schema.yaml.
Required Configuration
manifest
Type: String (file path)
Description: Path to a table containing the manifest of samples to be processed. This file should contain sample information including paths to input files.
Example:
manifest: "test/manifest.tbl"
Optional Configuration
max_threads
Type: Integer
Default: 64
Description: Maximum number of threads for parallel processing operations throughout the workflow.
Example:
max_threads: 32
ont
Type: Boolean
Default: false
Description: Enable Oxford Nanopore Technologies (ONT) specific processing. When true, affects output requirements and final CRAM file selection to optimize for ONT data characteristics.
Example:
ont: true
set-sm
Type: Boolean
Default: false
Description: Whether to set sample names in BAM headers to match the manifest. When enabled, BAM headers will be modified to ensure consistency with the provided manifest.
Example:
set-sm: true
mm2_preset
Type: String
Default: “lr:hq”
Description: Minimap2 preset parameter for alignment. Common options include ‘lr:hq’ for high-quality long reads, ‘map-ont’ for ONT reads, or ‘map-pb’ for PacBio reads.
Example:
mm2_preset: "map-ont"
mm2_extra_options
Type: String
Default: “” (empty)
Description: Additional command-line options to pass to minimap2 during alignment. Allows fine-tuning of alignment parameters beyond the preset.
Example:
mm2_extra_options: "-k19 -w10"
min_mapq
Type: Integer
Default: 1
Description: Minimum mapping quality threshold for haplotype assignment. Reads below this threshold still appear in the output but have their
HPtag cleared (set to unphased). The original assignment is preserved in theohtag for debugging.Example:
min_mapq: 20
reset_mapq
Type: Integer
Default: disabled
Description: Reset MAPQ of mapped reads to this value during haplotagging. The original MAPQ is preserved in the
omtag. Useful because DSA-alignment MAPQ values may not be meaningful for downstream tools that filter on MAPQ. By default, resets after haplotype assignment somin_mapqfiltering uses the original value (seereset_mapq_before).Example:
reset_mapq: 60
reset_mapq_before
Type: Boolean
Default: false
Description: When true (and
reset_mapqis set), reset MAPQ before haplotype assignment. This meansmin_mapqfiltering will use the new MAPQ value instead of the original. When false (default), MAPQ is reset after assignment so filtering uses the original alignment MAPQ.Example:
reset_mapq_before: true
ft_nuc_params
Type: String
Default: “” (empty)
Description: Additional parameters for the
ft add-nucleosomescommand in the modkit rule. Used for nucleosome detection and modification analysis.Example:
ft_nuc_params: "--nucleosome-length 60"
keep_read_assignments
Type: Boolean
Default: false
Description: When true, read-to-haplotype assignment TSV files are saved to
results/{sm}/instead of being placed intemp/and cleaned up. Useful for downstream analysis of phasing results.Example:
keep_read_assignments: true
Configuration Validation
The workflow automatically validates all configuration options against the schema defined in workflow/config.schema.yaml. Invalid configurations will cause the workflow to fail with descriptive error messages.
Example Configuration
# Required
manifest: "test/manifest.tbl"
# Optional (showing non-default values)
max_threads: 32
ont: true
set-sm: true
mm2_preset: "map-ont"
mm2_extra_options: "-k19"
min_mapq: 20
ft_nuc_params: "--nucleosome-length 60"
Notes
Only
manifestis required; all other options have sensible defaultsBoolean values should be lowercase:
trueorfalseString values should be quoted if they contain special characters
The workflow will print the loaded configuration to stderr for verification
Linting and formatting
Linting results
1Using workflow specific profile workflow/profiles/default for setting default command line arguments.
2WorkflowError in file "/tmp/tmphv7u24p1/workflow/Snakefile", line 15:
3Error validating config file.
4ValidationError: 'manifest' is a required property
5
6Failed validating 'required' in schema:
7 {'$schema': 'https://json-schema.org/draft/2020-12/schema',
8 'description': 'Configuration schema for the DSA-phasing pipeline',
9 'properties': {'manifest': {'type': 'string',
10 'description': 'Path to a table with the '
11 'manifest of samples to be '
12 'processed.'}},
13 'required': ['manifest'],
14 '$id': 'file:///tmp/tmphv7u24p1/workflow/config.schema.yaml'}
15
16On instance:
17 {}
Formatting results
All tests passed!