MPUSP/snakemake-ont-basecalling

A Snakemake workflow for basecalling and demultiplexing of Oxford Nanopore data using Dorado.

Overview

Latest release: v1.6.0, Last update: 2026-03-24

Share link: https://snakemake.github.io/snakemake-workflow-catalog?wf=MPUSP/snakemake-ont-basecalling

Quality control: linting: passed formatting: passed

Topics: basecalling cluster dorado nanopore-sequencing oxford-nanopore parallel-computing slurm snakemake snakemake-workflow

Workflow Rule Graph

This visualization of the workflow’s rule graph was automatically generated using Snakevision

Rule Graph light

Deployment

Step 1: Install Snakemake and Snakedeploy

Snakemake and Snakedeploy are best installed via the Conda package manager. It is recommended to install conda via Miniforge. Run

conda create -c conda-forge -c bioconda -c nodefaults --name snakemake snakemake snakedeploy

to install both Snakemake and Snakedeploy in an isolated environment. For all following commands ensure that this environment is activated via

conda activate snakemake

For other installation methods, refer to the Snakemake and Snakedeploy documentation.

Step 2: Deploy workflow

With Snakemake and Snakedeploy installed, the workflow can be deployed as follows. First, create an appropriate project working directory on your system and enter it:

mkdir -p path/to/project-workdir
cd path/to/project-workdir

In all following steps, we will assume that you are inside of that directory. Then run

snakedeploy deploy-workflow https://github.com/MPUSP/snakemake-ont-basecalling . --tag v1.6.0

Snakedeploy will create two folders, workflow and config. The former contains the deployment of the chosen workflow as a Snakemake module, the latter contains configuration files which will be modified in the next step in order to configure the workflow to your needs.

Step 3: Configure workflow

To configure the workflow, adapt config/config.yml to your needs following the instructions below.

Step 4: Run workflow

The deployment method is controlled using the --software-deployment-method (short --sdm) argument.

To run the workflow using a combination of conda and apptainer/singularity for software deployment, use

snakemake --cores all --sdm conda apptainer

To run the workflow with automatic deployment of all required software via conda/mamba, use

snakemake --cores all --sdm conda

Snakemake will automatically detect the main Snakefile in the workflow subfolder and execute the workflow module that has been defined by the deployment in step 2.

For further options such as cluster and cloud execution, see the docs.

Step 5: Generate report

After finalizing your data analysis, you can automatically generate an interactive visual HTML report for inspection of results together with parameters and code inside of the browser using

snakemake --report report.zip

Configuration

The following section is imported from the workflow’s config/README.md.

Running the workflow

Input data

This workflow requires pod5 input data. These input files are supplied to the workflow using a mandatory runs table linked in the config.yml file (default: .test/config/runs.csv). Each row in the runs table corresponds to a single run, for which all pod5 files are provided via a data_folder column. Multiple runs can be defined in the table. The runs table has the following layout:

run_id

data_folder

basecalling_model

barcode_kit

MK1C_run_01

“.test/data”

dna_r10.4.1_e8.2_400bps_sup@v5.0.0

SQK-PCB114-24

Execution

To define rule specific resources like GPU usage, configuration profiles will be used. See snakemake docs on profiles for more information. A default profile for local testing and a slurm specific cluster profile is provided with this workflow.

To run the workflow from command line, change to the working directory and activate the conda environment.

cd snakemake-ont-basecalling
conda activate snakemake-ont-basecalling

Adjust options in the default config file config/config.yml. Before running the entire workflow, perform a dry run using:

snakemake --cores 3 --sdm conda --directory .test --dry-run

To run the workflow with test files using conda:

snakemake --cores 3 --sdm conda --directory .test

To run the workflow with test files using conda and apptainer, set the dorado path to /share/resources/dorado-<version>-linux-x64/bin/dorado and make it available for apptainer using bind:

snakemake --cores 3 --sdm conda apptainer --directory .test --apptainer-args "--bind ../resources:/share/resources"

To run the workflow with test files on a slurm cluster, adjust the slurm-specific profile workflow/profiles/slurm/config.yaml file and run:

snakemake --cores 3 --sdm conda --workflow-profile workflow/profiles/slurm/ --directory .test

Note: It is recommended to start the snakemake pipeline on the cluster using a session multiplexer like screen or tmux.

Workflow parameters

The following table is automatically parsed from the workflow’s config.schema.y(a)ml file.

Parameter

Type

Description

Required

Default

input

yes

. runs

string

table with sequencing runs

yes

config/runs.csv

. file_extension

string

file extension for input files

yes

.pod5

. file_regex

string

regular expression to match input files

yes

[A-Z]{3}[0-9]{5}…

. barcodes

string

range of barcodes to process

yes

1-24

dorado

yes

. path

string

path to the Dorado executable

yes

none

. simplex

yes

. . cuda

string

CUDA device, one of: ‘auto’, ‘cuda:0’, ‘cuda:all’

yes

cuda:all

. . trim

string

trimming option for Dorado, ‘all’ or ‘none’

yes

none

. . extra

string

extra options for Dorado

. demultiplexing

boolean

whether to perform demultiplexing

true

report

yes

. tools

array

list of tools to include in the report

yes

[‘pycoQC’, ‘NanoPlot’]

Linting and formatting

Linting results
All tests passed!
Formatting results
All tests passed!