merfre/MMCAW
Reproducible Snakemake workflow for ONT metagenomic microbiome analysis with multi-tool taxonomic assignment, consensus comparison, and diversity analysis.
Overview
Latest release: v1.1.0, Last update: 2026-04-18
Share link: https://snakemake.github.io/snakemake-workflow-catalog?wf=merfre/MMCAW
Quality control: linting: failed formatting: failed
Topics: metagenomics microbiome snakemake taxonomic-assignment bioinformatics blast kraken2 nanopore reproducibility taxonomy workflow
Deployment
Step 1: Install Snakemake and Snakedeploy
Snakemake and Snakedeploy are best installed via the Conda package manager. It is recommended to install conda via Miniforge. Run
conda create -c conda-forge -c bioconda -c nodefaults --name snakemake snakemake snakedeploy
to install both Snakemake and Snakedeploy in an isolated environment. For all following commands ensure that this environment is activated via
conda activate snakemake
For other installation methods, refer to the Snakemake and Snakedeploy documentation.
Step 2: Deploy workflow
With Snakemake and Snakedeploy installed, the workflow can be deployed as follows. First, create an appropriate project working directory on your system and enter it:
mkdir -p path/to/project-workdir
cd path/to/project-workdir
In all following steps, we will assume that you are inside of that directory. Then run
snakedeploy deploy-workflow https://github.com/merfre/MMCAW . --tag v1.1.0
Snakedeploy will create two folders, workflow and config. The former contains the deployment of the chosen workflow as a Snakemake module, the latter contains configuration files which will be modified in the next step in order to configure the workflow to your needs.
Step 3: Configure workflow
To configure the workflow, adapt config/config.yml to your needs following the instructions below.
Step 4: Run workflow
The deployment method is controlled using the --software-deployment-method (short --sdm) argument.
To run the workflow with automatic deployment of all required software via conda/mamba, use
snakemake --cores all --sdm conda
Snakemake will automatically detect the main Snakefile in the workflow subfolder and execute the workflow module that has been defined by the deployment in step 2.
For further options such as cluster and cloud execution, see the docs.
Step 5: Generate report
After finalizing your data analysis, you can automatically generate an interactive visual HTML report for inspection of results together with parameters and code inside of the browser using
snakemake --report report.zip
Configuration
The following section is imported from the workflow’s config/README.md.
Configuration
This document describes the structure and parameters of config/config.yaml for MMCAW. Users should modify this file to control workflow behaviour, input locations, and analysis settings.
Overview of config.yaml
The configuration file defines:
Software environments
Input metadata location
Core workflow options (which analysis components to run)
Database locations
Tool-specific parameters
Required metadata
Sample/run metadata table
The metadata file is specified by: metadata_file: "config/ONT_mock_cont_assem.txt"
This file must include, at minimum:
Sample identifier
Run identifier
Paths to FASTQ files
Paths to sequence summary files
Paths to unblocked read ID lists (if used)
Column names and exact format should be consistent with the workflow’s expectations in the Snakefile.
Analysis options (workflow toggles)
These boolean parameters control which components of the workflow are executed:
Parameter |
Description |
|---|---|
|
Build databases within the workflow |
|
Enable CAT classification |
|
Enable Kraken2 classification |
|
Enable BLAST classification |
|
Enable Sourmash (optional) |
|
Enable PhyloPhlAn phylogenetic analysis |
|
Compare and merge assigner results (requires ≥2 assigners) |
|
Identify resistance genes via CARD (requires CAT) |
Database locations
Users must either place databases in the default locations below or update the paths accordingly:
filtering_reference: "resources/databases/human_reference/GCF_000001405.40_GRCh38.p14_genomic.fna"
kraken_db: "~/Kraken2_Simple_Workflow/resources/databases/krakenstd_06_2023/kraken2_std_database"
cat_db: "resources/databases/20240422_CAT_nr/db"
cat_taxonomy: "resources/databases/20240422_CAT_nr/tax"
blast_db: "resources/databases/NCBI_blast_database/nt"
taxdump: "resources/databases/taxdump"
Preprocessing parameters (fastp)
Parameter |
Description |
|---|---|
|
Minimum Phred score to count as qualified |
|
Maximum % of unqualified bases allowed |
|
Minimum average read quality (0 = no filter) |
|
Minimum read length |
|
Bases trimmed from 5’ end |
|
Bases trimmed from 3’ end |
Assembly parameters (Flye)
Parameter |
Description |
|---|---|
|
|
|
Minimum overlap length between reads (default 1000) |
Taxonomic assignment parameters
Kraken2
Parameter |
Description |
|---|---|
|
Confidence threshold (0–1) for taxonomic labels |
BLAST
Parameter |
Description |
|---|---|
|
Minimum percent identity |
|
Maximum e-value |
|
Maximum number of target sequences |
Modified LCA (MLCA) parameters
Parameter |
Description |
|---|---|
|
Minimum bitscore |
|
Minimum percent identity |
|
Minimum alignment coverage |
|
Majority threshold (%) |
|
Minimum number of hits |
Assigner comparison and consensus
When include_comparison: True, MMCAW:
Standardizes taxonomy outputs from CAT, Kraken2, and BLAST
Assigns consensus taxonomy if at least two tools agree
Labels contigs as no_agreement if all three disagree
Reports percent agreement across taxonomy levels
Plotting and reporting
Parameter |
Description |
|---|---|
|
Number of most prevalent species shown in final plots (default 25) |
Benchmarking and resources
threads: Number of cores available to Snakemake rules (default 10)Benchmarking is enabled by default and outputs rule-level performance metrics.
Minimal example config (snippet)
conda_envs: "workflow/envs/environment.yml"
metadata_file: "config/ONT_mock_cont_assem.txt"
threads: "10"
include_cat: True
include_kraken2: True
include_blast: True
include_comparison: True
filtering_reference: "resources/databases/human_reference/GCF_000001405.40_GRCh38.p14_genomic.fna"
Linting and formatting
Linting results
1/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/Snakefile:43: SyntaxWarning: invalid escape sequence '\/'
2 RUNS = "[^\/]+"
3/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/Snakefile:44: SyntaxWarning: invalid escape sequence '\/'
4
5/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/Snakefile:45: SyntaxWarning: invalid escape sequence '\/'
6 ### Concatenate fastq files from barcodes ###
7FileNotFoundError in file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/Snakefile", line 65:
8[Errno 2] No such file or directory: './resources/ONT_mockecoli1_cont1/CN3.1_cont.fastq'
9 File "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/Snakefile", line 65, in <module>
Formatting results
1[DEBUG]
2[DEBUG] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/metaflye.smk": Formatted content is different from original
3[DEBUG]
4[ERROR] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/taxonomy_assigner_comparison.smk": EmptyContextError: L80: rule has no keywords attached to it.
5[DEBUG] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/taxonomy_assigner_comparison.smk":
6[ERROR] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/db_creation.smk": InvalidPython: Black error:
Cannot parse for target version Python 3.13: 3:12: subworkflow db_creation:
(Note reported line number may be incorrect, as snakefmt could not determine the true line number)
[DEBUG] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/db_creation.smk":
[ERROR] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/kraken2.smk": EmptyContextError: L64: rule has no keywords attached to it.
[DEBUG] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/kraken2.smk":
[ERROR] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/contig_annotation_tool.smk": EmptyContextError: L32: rule has no keywords attached to it.
[DEBUG] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/contig_annotation_tool.smk":
[ERROR] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/phylophlan.smk": NoParametersError: L11: In input definition.
[DEBUG] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/phylophlan.smk":
[DEBUG] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/removing_human_seq.smk": Formatted content is different from original
[DEBUG]
[ERROR] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/blast.smk": EmptyContextError: L90: rule has no keywords attached to it.
[DEBUG] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/blast.smk":
[DEBUG] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/fastp.smk": Formatted content is different from original
[DEBUG]
[DEBUG] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/sourmash.smk": Formatted content is different from original
[DEBUG]
[DEBUG] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/taxdump.smk": Formatted content is different from original
[DEBUG]
<unknown>:1: SyntaxWarning: invalid escape sequence '\/'
<unknown>:1: SyntaxWarning: invalid escape sequence '\/'
<unknown>:1: SyntaxWarning: invalid escape sequence '\/'
[ERROR] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/Snakefile": InvalidPython: Black error:
Cannot parse for target version Python 3.13: 2:0: else:
(Note reported line number may be incorrect, as snakefmt could not determine the true line number)
[DEBUG] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/Snakefile":
[DEBUG] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/card_rgi.smk": Formatted content is different from original
[INFO] 7 file(s) raised parsing errors 🤕
[INFO] 6 file(s) would be changed 😬
snakefmt version: 0.11.5