merfre/MMCAW

Reproducible Snakemake workflow for ONT metagenomic microbiome analysis with multi-tool taxonomic assignment, consensus comparison, and diversity analysis.

Overview

Latest release: v1.1.0, Last update: 2026-04-18

Share link: https://snakemake.github.io/snakemake-workflow-catalog?wf=merfre/MMCAW

Quality control: linting: failed formatting: failed

Topics: metagenomics microbiome snakemake taxonomic-assignment bioinformatics blast kraken2 nanopore reproducibility taxonomy workflow

Deployment

Step 1: Install Snakemake and Snakedeploy

Snakemake and Snakedeploy are best installed via the Conda package manager. It is recommended to install conda via Miniforge. Run

conda create -c conda-forge -c bioconda -c nodefaults --name snakemake snakemake snakedeploy

to install both Snakemake and Snakedeploy in an isolated environment. For all following commands ensure that this environment is activated via

conda activate snakemake

For other installation methods, refer to the Snakemake and Snakedeploy documentation.

Step 2: Deploy workflow

With Snakemake and Snakedeploy installed, the workflow can be deployed as follows. First, create an appropriate project working directory on your system and enter it:

mkdir -p path/to/project-workdir
cd path/to/project-workdir

In all following steps, we will assume that you are inside of that directory. Then run

snakedeploy deploy-workflow https://github.com/merfre/MMCAW . --tag v1.1.0

Snakedeploy will create two folders, workflow and config. The former contains the deployment of the chosen workflow as a Snakemake module, the latter contains configuration files which will be modified in the next step in order to configure the workflow to your needs.

Step 4: Run workflow

The deployment method is controlled using the --software-deployment-method (short --sdm) argument.

To run the workflow with automatic deployment of all required software via conda/mamba, use

snakemake --cores all --sdm conda

Snakemake will automatically detect the main Snakefile in the workflow subfolder and execute the workflow module that has been defined by the deployment in step 2.

For further options such as cluster and cloud execution, see the docs.

Step 5: Generate report

After finalizing your data analysis, you can automatically generate an interactive visual HTML report for inspection of results together with parameters and code inside of the browser using

snakemake --report report.zip

Configuration

The following section is imported from the workflow’s config/README.md.

Configuration

This document describes the structure and parameters of config/config.yaml for MMCAW. Users should modify this file to control workflow behaviour, input locations, and analysis settings.

Overview of config.yaml

The configuration file defines:

Software environments
Input metadata location
Core workflow options (which analysis components to run)
Database locations
Tool-specific parameters

Required metadata

Sample/run metadata table

The metadata file is specified by: metadata_file: "config/ONT_mock_cont_assem.txt"

This file must include, at minimum:

Sample identifier
Run identifier
Paths to FASTQ files
Paths to sequence summary files
Paths to unblocked read ID lists (if used)

Column names and exact format should be consistent with the workflow’s expectations in the Snakefile.

Analysis options (workflow toggles)

These boolean parameters control which components of the workflow are executed:

Parameter	Description
`include_db_creation`	Build databases within the workflow
`include_cat`	Enable CAT classification
`include_kraken2`	Enable Kraken2 classification
`include_blast`	Enable BLAST classification
`include_sourmash`	Enable Sourmash (optional)
`include_phylophlan`	Enable PhyloPhlAn phylogenetic analysis
`include_comparison`	Compare and merge assigner results (requires ≥2 assigners)
`include_rgi`	Identify resistance genes via CARD (requires CAT)

Database locations

Users must either place databases in the default locations below or update the paths accordingly:

filtering_reference: "resources/databases/human_reference/GCF_000001405.40_GRCh38.p14_genomic.fna"
kraken_db: "~/Kraken2_Simple_Workflow/resources/databases/krakenstd_06_2023/kraken2_std_database"
cat_db: "resources/databases/20240422_CAT_nr/db"
cat_taxonomy: "resources/databases/20240422_CAT_nr/tax"
blast_db: "resources/databases/NCBI_blast_database/nt"
taxdump: "resources/databases/taxdump"

Preprocessing parameters (fastp)

Parameter	Description
`qualified_quality_phred`	Minimum Phred score to count as qualified
`unqualified_percent_limit`	Maximum % of unqualified bases allowed
`average_qual`	Minimum average read quality (0 = no filter)
`min_length`	Minimum read length
`front_trim`	Bases trimmed from 5’ end
`tail_trim`	Bases trimmed from 3’ end

Assembly parameters (Flye)

Parameter	Description
`read_type`	`--nano-raw` (default) for uncorrected ONT reads
`minimum_overlap`	Minimum overlap length between reads (default 1000)

Taxonomic assignment parameters

Kraken2

Parameter	Description
`kraken_confidence`	Confidence threshold (0–1) for taxonomic labels

BLAST

Parameter	Description
`BLAST_min_perc_ident`	Minimum percent identity
`BLAST_min_evalue`	Maximum e-value
`BLAST_max_target_seqs`	Maximum number of target sequences

Modified LCA (MLCA) parameters

Parameter	Description
`MLCA_bitscore`	Minimum bitscore
`MLCA_identity`	Minimum percent identity
`MCLA_coverage`	Minimum alignment coverage
`MLCA_majority`	Majority threshold (%)
`MLCA_hits`	Minimum number of hits

Assigner comparison and consensus

When include_comparison: True, MMCAW:

Standardizes taxonomy outputs from CAT, Kraken2, and BLAST
Assigns consensus taxonomy if at least two tools agree
Labels contigs as no_agreement if all three disagree
Reports percent agreement across taxonomy levels

Plotting and reporting

Parameter	Description
`prevalence`	Number of most prevalent species shown in final plots (default 25)

Benchmarking and resources

threads: Number of cores available to Snakemake rules (default 10)
Benchmarking is enabled by default and outputs rule-level performance metrics.

Minimal example config (snippet)

conda_envs: "workflow/envs/environment.yml"
metadata_file: "config/ONT_mock_cont_assem.txt"
threads: "10"

include_cat: True
include_kraken2: True
include_blast: True
include_comparison: True

filtering_reference: "resources/databases/human_reference/GCF_000001405.40_GRCh38.p14_genomic.fna"

Linting and formatting

Linting results

/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/Snakefile:43: SyntaxWarning: invalid escape sequence '\/'
  RUNS = "[^\/]+"
/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/Snakefile:44: SyntaxWarning: invalid escape sequence '\/'
  
/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/Snakefile:45: SyntaxWarning: invalid escape sequence '\/'
  ### Concatenate fastq files from barcodes ###
FileNotFoundError in file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/Snakefile", line 65:
[Errno 2] No such file or directory: './resources/ONT_mockecoli1_cont1/CN3.1_cont.fastq'
  File "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/Snakefile", line 65, in <module>

Formatting results

[DEBUG] 
[DEBUG] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/metaflye.smk":  Formatted content is different from original
[DEBUG] 
[ERROR] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/taxonomy_assigner_comparison.smk":  EmptyContextError: L80: rule has no keywords attached to it.
[DEBUG] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/taxonomy_assigner_comparison.smk":  
[ERROR] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/db_creation.smk":  InvalidPython: Black error:

Cannot parse for target version Python 3.13: 3:12: subworkflow db_creation:

(Note reported line number may be incorrect, as snakefmt could not determine the true line number)

[DEBUG] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/db_creation.smk":  
[ERROR] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/kraken2.smk":  EmptyContextError: L64: rule has no keywords attached to it.
[DEBUG] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/kraken2.smk":  
[ERROR] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/contig_annotation_tool.smk":  EmptyContextError: L32: rule has no keywords attached to it.
[DEBUG] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/contig_annotation_tool.smk":  
[ERROR] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/phylophlan.smk":  NoParametersError: L11: In input definition.
[DEBUG] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/phylophlan.smk":  
[DEBUG] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/removing_human_seq.smk":  Formatted content is different from original
[DEBUG] 
[ERROR] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/blast.smk":  EmptyContextError: L90: rule has no keywords attached to it.
[DEBUG] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/blast.smk":  
[DEBUG] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/fastp.smk":  Formatted content is different from original
[DEBUG] 
[DEBUG] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/sourmash.smk":  Formatted content is different from original
[DEBUG] 
[DEBUG] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/taxdump.smk":  Formatted content is different from original
[DEBUG] 
<unknown>:1: SyntaxWarning: invalid escape sequence '\/'
<unknown>:1: SyntaxWarning: invalid escape sequence '\/'
<unknown>:1: SyntaxWarning: invalid escape sequence '\/'
[ERROR] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/Snakefile":  InvalidPython: Black error:

Cannot parse for target version Python 3.13: 2:0: else:

(Note reported line number may be incorrect, as snakefmt could not determine the true line number)

[DEBUG] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/Snakefile":  
[DEBUG] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/card_rgi.smk":  Formatted content is different from original
[INFO] 7 file(s) raised parsing errors 🤕
[INFO] 6 file(s) would be changed 😬

snakefmt version: 0.11.5