MPUSP/snakemake-bacterial-riboseq

Bacterial-Riboseq: A Snakemake workflow for the analysis of riboseq data in bacteria.

Overview

Latest release: v1.6.0, Last update: 2026-04-07

Share link: https://snakemake.github.io/snakemake-workflow-catalog?wf=MPUSP/snakemake-bacterial-riboseq

Quality control: linting: passed formatting: passed

Topics: bioinformatics-pipeline conda riboseq ribosome-profiling singularity snakemake workflow

Wrappers: bio/cutadapt/se bio/fastqc

Workflow Rule Graph

This visualization of the workflow’s rule graph was automatically generated using Snakevision

Rule Graph light

Deployment

Step 1: Install Snakemake and Snakedeploy

Snakemake and Snakedeploy are best installed via the Conda package manager. It is recommended to install conda via Miniforge. Run

conda create -c conda-forge -c bioconda -c nodefaults --name snakemake snakemake snakedeploy

to install both Snakemake and Snakedeploy in an isolated environment. For all following commands ensure that this environment is activated via

conda activate snakemake

For other installation methods, refer to the Snakemake and Snakedeploy documentation.

Step 2: Deploy workflow

With Snakemake and Snakedeploy installed, the workflow can be deployed as follows. First, create an appropriate project working directory on your system and enter it:

mkdir -p path/to/project-workdir
cd path/to/project-workdir

In all following steps, we will assume that you are inside of that directory. Then run

snakedeploy deploy-workflow https://github.com/MPUSP/snakemake-bacterial-riboseq . --tag v1.6.0

Snakedeploy will create two folders, workflow and config. The former contains the deployment of the chosen workflow as a Snakemake module, the latter contains configuration files which will be modified in the next step in order to configure the workflow to your needs.

Step 3: Configure workflow

To configure the workflow, adapt config/config.yml to your needs following the instructions below.

Step 4: Run workflow

The deployment method is controlled using the --software-deployment-method (short --sdm) argument.

To run the workflow with automatic deployment of all required software via conda/mamba, use

snakemake --cores all --sdm conda

To run the workflow using a combination of conda and apptainer/singularity for software deployment, use

snakemake --cores all --sdm conda apptainer

Snakemake will automatically detect the main Snakefile in the workflow subfolder and execute the workflow module that has been defined by the deployment in step 2.

For further options such as cluster and cloud execution, see the docs.

Step 5: Generate report

After finalizing your data analysis, you can automatically generate an interactive visual HTML report for inspection of results together with parameters and code inside of the browser using

snakemake --report report.zip

Configuration

The following section is imported from the workflow’s config/README.md.

Running the workflow

Input data

Reference genome

An NCBI Refseq ID, e.g. GCF_000006945.2. Find your genome assembly and corresponding ID on NCBI genomes. Alternatively use a custom pair of *.fasta file and *.gff file that describe the genome of choice.

Important requirements when using custom *.fasta and *.gff files:

  • *.gff genome annotation must have the same chromosome/region name as the *.fasta file (example: NC_003197.2)

  • *.gff genome annotation must have gene and CDS type annotation that is automatically parsed to extract transcripts

  • all chromosomes/regions in the *.gff genome annotation must be present in the *.fasta sequence

  • but not all sequences in the *.fasta file need to have annotated genes in the *.gff file

Read data

Ribosome footprint sequencing data in *.fastq.gz format. The currently supported input data are single-end, strand-specific reads. Input data files are supplied via a mandatory table, whose location is indicated in the config.yml file (default: samples.tsv). The sample sheet has the following layout:

sample

condition

replicate

fq1

RPF-RTP1

RPF-RTP

1

data/RPF-RTP1_R1_001.fastq.gz

RPF-RTP2

RPF-RTP

2

data/RPF-RTP2_R1_001.fastq.gz

Some configuration parameters of the pipeline may be specific for your data and library preparation protocol. The options should be adjusted in the config.yml file. For example:

  • Minimum and maximum read length after adapter removal (see option cutadapt: default). Here, the test data has a minimum read length of 15 + 7 = 22 (2 nt on 5’end + 5 nt on 3’end), and a maximum of 45 + 7 = 52.

  • Unique molecular identifiers (UMIs). For example, the protocol by McGlincy & Ingolia, 2017 creates a UMI that is located on both the 5’-end (2 nt) and the 3’-end (5 nt). These UMIs are extracted with umi_tools (see options umi_extraction: method and pattern).

Example configuration files for different sequencing protocols can be found in resources/protocols/.

Workflow parameters

The following table is automatically parsed from the workflow’s config.schema.y(a)ml file.

Parameter

Type

Description

Required

Default

samplesheet

string

path to samplesheet, mandatory

yes

config/samples.tsv

get_genome

reference genome source and files

yes

. database

[‘string’, ‘null’]

one of manual, ncbi

ncbi

. assembly

[‘string’, ‘null’]

RefSeq ID

GCF_000006785.2

. fasta

[‘string’, ‘null’]

optional path to fasta file

. gff

[‘string’, ‘null’]

optional path to gff file

. gff_source_type

array

list of name/value pairs for GFF source

cutadapt

adapter trimming parameters

yes

. adapters

string

sequence of 5’ (-g) / 3’ (-a) adapter

-a ATCGTAGATCGGAAGAGCACACGTCTGAA

. default

array

additional options passed to cutadapt

[‘-q 10 ‘, ‘-m 22 ‘, ‘-M 52’, ‘–overlap=3’]

umi_extraction

UMI extraction settings

yes

. method

string

one of string or regex, see manual

regex

. pattern

string

string or regular expression

^(?P<umi_0>.{5}).*(?P<umi_1>.{2})$

umi_dedup

array

default options for deduplication

yes

star

STAR alignment settings

yes

. index

[‘string’, ‘null’]

location of genome index; if Null, is made

. genomeSAindexNbases

number

length of pre-indexing string, see STAR man

9

. multi

number

max number of loci read is allowed to map

10

. sam_multi

number

max number of alignments reported for read

1

. intron_max

number

max length of intron; 0 = automatic choice

1

. default

array

default options for STAR aligner

extract_features

feature extraction and filtering

yes

. biotypes

array

biotypes to exclude from mapping

[‘rRNA’, ‘tRNA’]

. CDS

array

CDS type to include for mapping

[‘protein_coding’]

bedtools_intersect

bedtools intersect options

yes

. defaults

array

remove hits, sense strand, min overlap 20%

[‘-v ‘, ‘-s ‘, ‘-f 0.2’]

annotate_orfs

ORF annotation settings

yes

. window_size

number

size of 5’-UTR added to CDS

30

shift_reads

read shifting parameters

yes

. window_size

number

start codon window to determine shift

30

. read_length

array

size range of reads to use for shifting

[27, 45]

. end_alignment

string

end used for alignment of RiboSeq reads

3prime

. shift_table

[‘string’, ‘null’]

optional table with offsets per read length

. export_bigwig

boolean

export shifted reads as bam file

true

. export_ofst

boolean

export shifted reads as ofst file

false

. skip_shifting

boolean

skip read shifting entirely

false

. skip_length_filter

boolean

skip filtering reads by length

false

multiqc

MultiQC reporting parameters

yes

. fastqc_stage

string

. config

string

path to multiqc config

config/multiqc_config.yml

report

report rendering parameters

yes

. export_figures

boolean

export figures as .svg and .png

true

. export_dir

string

sub-directory for figure export

figures/

. figure_width

number

standard figure width in px

875

. figure_height

number

standard figure height in px

500

. figure_resolution

number

standard figure resolution in dpi

125

Linting and formatting

Linting results
All tests passed!
Formatting results
All tests passed!