merfre/MMCAW

Reproducible Snakemake workflow for ONT metagenomic microbiome analysis with multi-tool taxonomic assignment, consensus comparison, and diversity analysis.

Overview

Latest release: v1.1.0, Last update: 2026-04-18

Share link: https://snakemake.github.io/snakemake-workflow-catalog?wf=merfre/MMCAW

Quality control: linting: failed formatting: failed

Topics: metagenomics microbiome snakemake taxonomic-assignment bioinformatics blast kraken2 nanopore reproducibility taxonomy workflow

Deployment

Step 1: Install Snakemake and Snakedeploy

Snakemake and Snakedeploy are best installed via the Conda package manager. It is recommended to install conda via Miniforge. Run

conda create -c conda-forge -c bioconda -c nodefaults --name snakemake snakemake snakedeploy

to install both Snakemake and Snakedeploy in an isolated environment. For all following commands ensure that this environment is activated via

conda activate snakemake

For other installation methods, refer to the Snakemake and Snakedeploy documentation.

Step 2: Deploy workflow

With Snakemake and Snakedeploy installed, the workflow can be deployed as follows. First, create an appropriate project working directory on your system and enter it:

mkdir -p path/to/project-workdir
cd path/to/project-workdir

In all following steps, we will assume that you are inside of that directory. Then run

snakedeploy deploy-workflow https://github.com/merfre/MMCAW . --tag v1.1.0

Snakedeploy will create two folders, workflow and config. The former contains the deployment of the chosen workflow as a Snakemake module, the latter contains configuration files which will be modified in the next step in order to configure the workflow to your needs.

Step 3: Configure workflow

To configure the workflow, adapt config/config.yml to your needs following the instructions below.

Step 4: Run workflow

The deployment method is controlled using the --software-deployment-method (short --sdm) argument.

To run the workflow with automatic deployment of all required software via conda/mamba, use

snakemake --cores all --sdm conda

Snakemake will automatically detect the main Snakefile in the workflow subfolder and execute the workflow module that has been defined by the deployment in step 2.

For further options such as cluster and cloud execution, see the docs.

Step 5: Generate report

After finalizing your data analysis, you can automatically generate an interactive visual HTML report for inspection of results together with parameters and code inside of the browser using

snakemake --report report.zip

Configuration

The following section is imported from the workflow’s config/README.md.

Configuration

This document describes the structure and parameters of config/config.yaml for MMCAW. Users should modify this file to control workflow behaviour, input locations, and analysis settings.

Overview of config.yaml

The configuration file defines:

  • Software environments

  • Input metadata location

  • Core workflow options (which analysis components to run)

  • Database locations

  • Tool-specific parameters

Required metadata

Sample/run metadata table

The metadata file is specified by: metadata_file: "config/ONT_mock_cont_assem.txt"

This file must include, at minimum:

  • Sample identifier

  • Run identifier

  • Paths to FASTQ files

  • Paths to sequence summary files

  • Paths to unblocked read ID lists (if used)

Column names and exact format should be consistent with the workflow’s expectations in the Snakefile.

Analysis options (workflow toggles)

These boolean parameters control which components of the workflow are executed:

Parameter

Description

include_db_creation

Build databases within the workflow

include_cat

Enable CAT classification

include_kraken2

Enable Kraken2 classification

include_blast

Enable BLAST classification

include_sourmash

Enable Sourmash (optional)

include_phylophlan

Enable PhyloPhlAn phylogenetic analysis

include_comparison

Compare and merge assigner results (requires ≥2 assigners)

include_rgi

Identify resistance genes via CARD (requires CAT)

Database locations

Users must either place databases in the default locations below or update the paths accordingly:

filtering_reference: "resources/databases/human_reference/GCF_000001405.40_GRCh38.p14_genomic.fna"
kraken_db: "~/Kraken2_Simple_Workflow/resources/databases/krakenstd_06_2023/kraken2_std_database"
cat_db: "resources/databases/20240422_CAT_nr/db"
cat_taxonomy: "resources/databases/20240422_CAT_nr/tax"
blast_db: "resources/databases/NCBI_blast_database/nt"
taxdump: "resources/databases/taxdump"

Preprocessing parameters (fastp)

Parameter

Description

qualified_quality_phred

Minimum Phred score to count as qualified

unqualified_percent_limit

Maximum % of unqualified bases allowed

average_qual

Minimum average read quality (0 = no filter)

min_length

Minimum read length

front_trim

Bases trimmed from 5’ end

tail_trim

Bases trimmed from 3’ end

Assembly parameters (Flye)

Parameter

Description

read_type

--nano-raw (default) for uncorrected ONT reads

minimum_overlap

Minimum overlap length between reads (default 1000)

Taxonomic assignment parameters

Kraken2

Parameter

Description

kraken_confidence

Confidence threshold (0–1) for taxonomic labels

BLAST

Parameter

Description

BLAST_min_perc_ident

Minimum percent identity

BLAST_min_evalue

Maximum e-value

BLAST_max_target_seqs

Maximum number of target sequences

Modified LCA (MLCA) parameters

Parameter

Description

MLCA_bitscore

Minimum bitscore

MLCA_identity

Minimum percent identity

MCLA_coverage

Minimum alignment coverage

MLCA_majority

Majority threshold (%)

MLCA_hits

Minimum number of hits

Assigner comparison and consensus

When include_comparison: True, MMCAW:

  • Standardizes taxonomy outputs from CAT, Kraken2, and BLAST

  • Assigns consensus taxonomy if at least two tools agree

  • Labels contigs as no_agreement if all three disagree

  • Reports percent agreement across taxonomy levels

Plotting and reporting

Parameter

Description

prevalence

Number of most prevalent species shown in final plots (default 25)

Benchmarking and resources

  • threads: Number of cores available to Snakemake rules (default 10)

  • Benchmarking is enabled by default and outputs rule-level performance metrics.

Minimal example config (snippet)

conda_envs: "workflow/envs/environment.yml"
metadata_file: "config/ONT_mock_cont_assem.txt"
threads: "10"

include_cat: True
include_kraken2: True
include_blast: True
include_comparison: True

filtering_reference: "resources/databases/human_reference/GCF_000001405.40_GRCh38.p14_genomic.fna"

Linting and formatting

Linting results
1/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/Snakefile:43: SyntaxWarning: invalid escape sequence '\/'
2  RUNS = "[^\/]+"
3/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/Snakefile:44: SyntaxWarning: invalid escape sequence '\/'
4  
5/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/Snakefile:45: SyntaxWarning: invalid escape sequence '\/'
6  ### Concatenate fastq files from barcodes ###
7FileNotFoundError in file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/Snakefile", line 65:
8[Errno 2] No such file or directory: './resources/ONT_mockecoli1_cont1/CN3.1_cont.fastq'
9  File "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/Snakefile", line 65, in <module>
Formatting results
1[DEBUG] 
2[DEBUG] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/metaflye.smk":  Formatted content is different from original
3[DEBUG] 
4[ERROR] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/taxonomy_assigner_comparison.smk":  EmptyContextError: L80: rule has no keywords attached to it.
5[DEBUG] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/taxonomy_assigner_comparison.smk":  
6[ERROR] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/db_creation.smk":  InvalidPython: Black error:

Cannot parse for target version Python 3.13: 3:12: subworkflow db_creation:

(Note reported line number may be incorrect, as snakefmt could not determine the true line number)


[DEBUG] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/db_creation.smk":  
[ERROR] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/kraken2.smk":  EmptyContextError: L64: rule has no keywords attached to it.
[DEBUG] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/kraken2.smk":  
[ERROR] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/contig_annotation_tool.smk":  EmptyContextError: L32: rule has no keywords attached to it.
[DEBUG] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/contig_annotation_tool.smk":  
[ERROR] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/phylophlan.smk":  NoParametersError: L11: In input definition.
[DEBUG] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/phylophlan.smk":  
[DEBUG] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/removing_human_seq.smk":  Formatted content is different from original
[DEBUG] 
[ERROR] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/blast.smk":  EmptyContextError: L90: rule has no keywords attached to it.
[DEBUG] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/blast.smk":  
[DEBUG] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/fastp.smk":  Formatted content is different from original
[DEBUG] 
[DEBUG] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/sourmash.smk":  Formatted content is different from original
[DEBUG] 
[DEBUG] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/taxdump.smk":  Formatted content is different from original
[DEBUG] 
<unknown>:1: SyntaxWarning: invalid escape sequence '\/'
<unknown>:1: SyntaxWarning: invalid escape sequence '\/'
<unknown>:1: SyntaxWarning: invalid escape sequence '\/'
[ERROR] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/Snakefile":  InvalidPython: Black error:

Cannot parse for target version Python 3.13: 2:0: else:

(Note reported line number may be incorrect, as snakefmt could not determine the true line number)


[DEBUG] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/Snakefile":  
[DEBUG] In file "/tmp/tmp3jgqzkr2/merfre-MMCAW-c1b50ef/workflow/rules/card_rgi.smk":  Formatted content is different from original
[INFO] 7 file(s) raised parsing errors 🤕
[INFO] 6 file(s) would be changed 😬

snakefmt version: 0.11.5