MPUSP/snakemake-ont-basecalling

A Snakemake workflow for basecalling and demultiplexing of Oxford Nanopore data using Dorado.

Overview

Latest release: v1.5.1, Last update: 2025-12-03

Linting: linting: passed, Formatting: formatting: passed

Topics: basecalling cluster dorado nanopore-sequencing oxford-nanopore parallel-computing slurm snakemake snakemake-workflow

Deployment

Step 1: Install Snakemake and Snakedeploy

Snakemake and Snakedeploy are best installed via the Conda. It is recommended to install conda via Miniforge. Run

conda create -c conda-forge -c bioconda -c nodefaults --name snakemake snakemake snakedeploy

to install both Snakemake and Snakedeploy in an isolated environment. For all following commands ensure that this environment is activated via

conda activate snakemake

For other installation methods, refer to the Snakemake and Snakedeploy documentation.

Step 2: Deploy workflow

With Snakemake and Snakedeploy installed, the workflow can be deployed as follows. First, create an appropriate project working directory on your system and enter it:

mkdir -p path/to/project-workdir
cd path/to/project-workdir

In all following steps, we will assume that you are inside of that directory. Then run

snakedeploy deploy-workflow https://github.com/MPUSP/snakemake-ont-basecalling . --tag v1.5.1

Snakedeploy will create two folders, workflow and config. The former contains the deployment of the chosen workflow as a Snakemake module, the latter contains configuration files which will be modified in the next step in order to configure the workflow to your needs.

Step 3: Configure workflow

To configure the workflow, adapt config/config.yml to your needs following the instructions below.

Step 4: Run workflow

The deployment method is controlled using the --software-deployment-method (short --sdm) argument.

To run the workflow using a combination of conda and apptainer/singularity for software deployment, use

snakemake --cores all --sdm conda apptainer

To run the workflow with automatic deployment of all required software via conda/mamba, use

snakemake --cores all --sdm conda

Snakemake will automatically detect the main Snakefile in the workflow subfolder and execute the workflow module that has been defined by the deployment in step 2.

For further options such as cluster and cloud execution, see the docs.

Step 5: Generate report

After finalizing your data analysis, you can automatically generate an interactive visual HTML report for inspection of results together with parameters and code inside of the browser using

snakemake --report report.zip

Configuration

The following section is imported from the workflow’s config/README.md.

Running the workflow

Input data

This workflow requires pod5 input data. These input files are supplied to the workflow using a mandatory runs table linked in the config.yml file (default: .test/config/runs.csv). Each row in the runs table corresponds to a single run, for which all pod5 files are provided via a data_folder column. Multiple runs can be defined in the table. The runs table has the following layout:

run_id	data_folder	basecalling_model	barcode_kit
MK1C_run_01	“.test/data”	dna_r10.4.1_e8.2_400bps_sup@v5.0.0	SQK-PCB114-24

Execution

To define rule specific resources like GPU usage, configuration profiles will be used. See snakemake docs on profiles for more information. A default profile for local testing and a slurm specific cluster profile is provided with this workflow.

To run the workflow from command line, change to the working directory and activate the conda environment.

cd snakemake-ont-basecalling
conda activate snakemake-ont-basecalling

Adjust options in the default config file config/config.yml. Before running the entire workflow, perform a dry run using:

snakemake --cores 3 --sdm conda --directory .test --dry-run

To run the workflow with test files using conda:

snakemake --cores 3 --sdm conda --directory .test

To run the workflow with test files using conda and apptainer, set the dorado path to /share/resources/dorado-<version>-linux-x64/bin/dorado and make it available for apptainer using bind:

snakemake --cores 3 --sdm conda apptainer --directory .test --apptainer-args "--bind ../resources:/share/resources"

To run the workflow with test files on a slurm cluster, adjust the slurm-specific profile workflow/profiles/slurm/config.yaml file and run:

snakemake --sdm conda --workflow-profile workflow/profiles/slurm/ --directory .test

Note: It is recommended to start the snakemake pipeline on the cluster using a session multiplexer like screen or tmux.

Parameters

This table lists all parameters that can be used to run the workflow.

Parameter	Type	Details	Default
input
runs	string	table with sequencing runs	`config/runs.csv`
file_extension	string	extension for input files	`pod5`
file_regex	string	pattern to match input files	`[A-Z]{3}[0-9]{5}...`
barcodes	string	used barcodes for demultiplexing	`1-24`
dorado
path	string	path to the Dorado executable
simplex / cuda	string	CUDA device: `auto`, `cuda:0`, `cuda:all`	`cuda:all`
simplex / trim	string	`all` or `none`	`none`
simiplex / extra	string	params passed to dorado basecaller	`""`
demultiplexing	bool	whether to perform demultiplexing	`True`
report
tools	array	list of tools to include in the report	`["pycoQC", "NanoPlot"]`

Linting and formatting

Linting results

All tests passed!

Formatting results

All tests passed!