tucca-cellag/tucca-rna-seq

TUCCA’s RNA-Seq Workflow for Read Quantification, Differential Expression, and Pathway Enrichment Analysis

Overview

Latest release: v1.0.1, Last update: 2026-03-09

Share link: https://snakemake.github.io/snakemake-workflow-catalog?wf=tucca-cellag/tucca-rna-seq

Quality control: linting: passed formatting: failed

Topics: bioinformatics conda high-throughput reproducibility rna-seq singularity snakemake snakemake-workflow transcriptomics apptainer ideal renv genetonic pcaexplorer clusterprofiler deseq2 differential-expression salmon pathway-enrichment-analysis quality-control

Wrappers: bio/deseq2/deseqdataset bio/deseq2/wald bio/multiqc bio/reference/ensembl-annotation bio/reference/ensembl-sequence bio/salmon/decoys bio/salmon/index bio/salmon/quant bio/star/align bio/star/index

Deployment

Step 1: Install Snakemake and Snakedeploy

Snakemake and Snakedeploy are best installed via the Conda package manager. It is recommended to install conda via Miniforge. Run

conda create -c conda-forge -c bioconda -c nodefaults --name snakemake snakemake snakedeploy

to install both Snakemake and Snakedeploy in an isolated environment. For all following commands ensure that this environment is activated via

conda activate snakemake

For other installation methods, refer to the Snakemake and Snakedeploy documentation.

Step 2: Deploy workflow

With Snakemake and Snakedeploy installed, the workflow can be deployed as follows. First, create an appropriate project working directory on your system and enter it:

mkdir -p path/to/project-workdir
cd path/to/project-workdir

In all following steps, we will assume that you are inside of that directory. Then run

snakedeploy deploy-workflow https://github.com/tucca-cellag/tucca-rna-seq . --tag v1.0.1

Snakedeploy will create two folders, workflow and config. The former contains the deployment of the chosen workflow as a Snakemake module, the latter contains configuration files which will be modified in the next step in order to configure the workflow to your needs.

Step 3: Configure workflow

To configure the workflow, adapt config/config.yml to your needs following the instructions below.

Step 4: Run workflow

The deployment method is controlled using the --software-deployment-method (short --sdm) argument.

To run the workflow with automatic deployment of all required software via conda/mamba, use

snakemake --cores all --sdm conda

To run the workflow using a combination of conda and apptainer/singularity for software deployment, use

snakemake --cores all --sdm conda apptainer

Snakemake will automatically detect the main Snakefile in the workflow subfolder and execute the workflow module that has been defined by the deployment in step 2.

For further options such as cluster and cloud execution, see the docs.

Step 5: Generate report

After finalizing your data analysis, you can automatically generate an interactive visual HTML report for inspection of results together with parameters and code inside of the browser using

snakemake --report report.zip

Configuration

The following section is imported from the workflow’s config/README.md.

Disclaimer (Workflow Under-Construction)

THIS REPO IS STILL UNDER CONSTRUCTION AND DOES NOT REPRESENT A COMPLETED WORKFLOW

In the meantime, feel free to contact the current maintainer with any questions.

To configure the workflow please refer to the official documentation for tucca-rna-seq, which can be found here.

Workflow parameters

The following table is automatically parsed from the workflow’s config.schema.y(a)ml file.

Parameter

Type

Description

Required

Default

samples

string

yes

config/samples.tsv

units

string

yes

config/units.tsv

ref_assembly

yes

. source

string

source must be one of RefSeq, Ensembl, GENCODE

yes

. accession

string

. name

string

yes

. release

string

. species

string

Scientific name with underscore (e.g., Mus_musculus)

yes

. custom_files

. . custom_genome_fasta

string

. . custom_genome_gtf

string

. . custom_transcriptome_fasta

string

api_keys

yes

{}

. ncbi

string

diffexp

yes

{}

. tximeta

yes

{}

. . factors

array

yes

. . extra

string

Extra params for tximeta

. deseq2

yes

{}

. . analyses

array

yes

. . transform

yes

{}

. . . method

string

rlog

. . . extra

string

enrichment

yes

{}

. padj_cutoff

number

Adjusted p-value cutoff to define significant genes for ORA.

yes

0.05

. targets

array

List of target gene symbols to search for in enriched pathways. Not yet implemented. TODO

[]

. clusterprofiler

yes

{}

. . gsea

{}

. . . gseGO

yes

{}

. . . . extra

string

. . . gseKEGG

yes

{}

. . . . extra

string

. . ora

{}

. . . enrichGO

yes

{}

. . . . extra

string

. . . enrichKEGG

yes

{}

. . . . extra

string

. . kegg_module

{}

. . . enabled

boolean

false

. . . enrichMKEGG

yes

{}

. . . . extra

string

. . . gseMKEGG

yes

{}

. . . . extra

string

. . wikipathways

{}

. . . enabled

boolean

false

. . . enrichWP

yes

{}

. . . . extra

string

. . . gseWP

yes

{}

. . . . extra

string

. msigdb

MSigDB (Molecular Signatures Database) configuration

yes

. . enabled

boolean

true

. . collections

array

List of MSigDB collections to analyze (H, C1, C2, C3, C4, C5, C6, C7, C8)

yes

[‘H’]

. . custom_gmt_files

array

List of paths to custom GMT files for ORA and GSEA

yes

[]

. . ora

yes

{}

. . . extra

string

. . gsea

yes

{}

. . . extra

string

. spia

SPIA (Signaling Pathway Impact Analysis) configuration

yes

{}

. . enabled

boolean

false

. . extra

string

beta = NULL, verbose = TRUE, plots = FALSE

. harmonizome

Harmonizome database configuration for tissue-specific gene sets

yes

{}

. . enabled

boolean

false

. . datasets

array

List of Harmonizome datasets and gene sets to analyze (see https://maayanlab.cloud/Harmonizome/)

. . ora

yes

{}

. . . extra

string

. . gsea

yes

{}

. . . extra

string

. annotationforge

yes

{}

. . version

string

0.1.0

. . author

string

firstname.lastname@institution.edu

. . extra

string

useSynonyms = TRUE

params

yes

{}

. fastqc

{}

. . memory

integer

1024

. . extra

string

. star_index

{}

. . sjdbOverhang

integer

149

. . extra

string

. star

yes

{}

. . extra

string

–outSAMtype BAM SortedByCoordinate –outSAMunmapped Within –outSAMattributes Standard –outFilterMultimapNmax 1 –outFilterScoreMinOverLread 0 –outFilterMatchNminOverLread 0 –alignIntronMin 1 –alignIntronMax 2500

. qualimap_rnaseq

yes

{}

. . enabled

boolean

true

. . counting_alg

string

proportional

. . sequencing_protocol

string

non-strand-specific

. . extra

string

–paired –java-mem-size=8G

. salmon_index

yes

{}

. . extra

string

-k 31

. salmon_quant

yes

{}

. . libtype

string

A

. . extra

string

–seqBias –posBias –writeUnmappedNames

. multiqc

yes

{}

. . extra

string

Leave out –force if you don’t want to automatically overwrite existing multiqc results on a re-run

–verbose –force

. sra_tools

yes

{}

. . vdb_config_ra_path

string

/repository/user/main/remote_access=true

. . subsample

Configuration for subsampling SRA data for testing purposes

. . . enabled

boolean

Whether to use subsampling instead of full download

false

. . . min_spot_id

integer

Minimum spot ID for SRA subsampling

yes

1

. . . max_spot_id

integer

Maximum spot ID for SRA subsampling

yes

100000

Linting and formatting

Linting results
All tests passed!
Formatting results
 1[DEBUG] 
 2[DEBUG] 
 3[DEBUG] 
 4[DEBUG] 
 5[DEBUG] 
 6[DEBUG] 
 7[DEBUG] In file "/tmp/tmp1_mhsazo/tucca-cellag-tucca-rna-seq-697b13d/workflow/rules/common.smk":  Formatted content is different from original
 8[DEBUG] 
 9[DEBUG] In file "/tmp/tmp1_mhsazo/tucca-cellag-tucca-rna-seq-697b13d/workflow/Snakefile":  Formatted content is different from original
10[DEBUG] 
11[DEBUG] 
12[DEBUG] 
13[DEBUG] 
14[DEBUG] 
15[DEBUG] 
16[DEBUG] 
17[DEBUG] 
18[DEBUG] 
19[INFO] 2 file(s) would be changed 😬
20[INFO] 14 file(s) would be left unchanged 🎉
21
22snakefmt version: 0.11.4