tucca-cellag/tucca-rna-seq

TUCCA’s RNA-Seq Workflow for Read Quantification, Differential Expression, and Pathway Enrichment Analysis

Overview

Latest release: v1.0.1, Last update: 2026-03-09

Share link: https://snakemake.github.io/snakemake-workflow-catalog?wf=tucca-cellag/tucca-rna-seq

Quality control: linting: passed formatting: failed

Topics: bioinformatics conda high-throughput reproducibility rna-seq singularity snakemake snakemake-workflow transcriptomics apptainer ideal renv genetonic pcaexplorer clusterprofiler deseq2 differential-expression salmon pathway-enrichment-analysis quality-control

Wrappers: bio/deseq2/deseqdataset bio/deseq2/wald bio/multiqc bio/reference/ensembl-annotation bio/reference/ensembl-sequence bio/salmon/decoys bio/salmon/index bio/salmon/quant bio/star/align bio/star/index

Deployment

Step 1: Install Snakemake and Snakedeploy

Snakemake and Snakedeploy are best installed via the Conda package manager. It is recommended to install conda via Miniforge. Run

conda create -c conda-forge -c bioconda -c nodefaults --name snakemake snakemake snakedeploy

to install both Snakemake and Snakedeploy in an isolated environment. For all following commands ensure that this environment is activated via

conda activate snakemake

For other installation methods, refer to the Snakemake and Snakedeploy documentation.

Step 2: Deploy workflow

With Snakemake and Snakedeploy installed, the workflow can be deployed as follows. First, create an appropriate project working directory on your system and enter it:

mkdir -p path/to/project-workdir
cd path/to/project-workdir

In all following steps, we will assume that you are inside of that directory. Then run

snakedeploy deploy-workflow https://github.com/tucca-cellag/tucca-rna-seq . --tag v1.0.1

Snakedeploy will create two folders, workflow and config. The former contains the deployment of the chosen workflow as a Snakemake module, the latter contains configuration files which will be modified in the next step in order to configure the workflow to your needs.

Step 4: Run workflow

The deployment method is controlled using the --software-deployment-method (short --sdm) argument.

To run the workflow with automatic deployment of all required software via conda/mamba, use

snakemake --cores all --sdm conda

To run the workflow using a combination of conda and apptainer/singularity for software deployment, use

snakemake --cores all --sdm conda apptainer

Snakemake will automatically detect the main Snakefile in the workflow subfolder and execute the workflow module that has been defined by the deployment in step 2.

For further options such as cluster and cloud execution, see the docs.

Step 5: Generate report

After finalizing your data analysis, you can automatically generate an interactive visual HTML report for inspection of results together with parameters and code inside of the browser using

snakemake --report report.zip

Configuration

The following section is imported from the workflow’s config/README.md.

Disclaimer (Workflow Under-Construction)

THIS REPO IS STILL UNDER CONSTRUCTION AND DOES NOT REPRESENT A COMPLETED WORKFLOW

In the meantime, feel free to contact the current maintainer with any questions.

To configure the workflow please refer to the official documentation for tucca-rna-seq, which can be found here.

Workflow parameters

The following table is automatically parsed from the workflow’s config.schema.y(a)ml file.

Parameter	Type	Description	Required	Default
samples	string		yes	config/samples.tsv
units	string		yes	config/units.tsv
ref_assembly			yes
. source	string	source must be one of RefSeq, Ensembl, GENCODE	yes
. accession	string
. name	string		yes
. release	string
. species	string	Scientific name with underscore (e.g., Mus_musculus)	yes
. custom_files
. . custom_genome_fasta	string
. . custom_genome_gtf	string
. . custom_transcriptome_fasta	string
api_keys			yes	{}
. ncbi	string
diffexp			yes	{}
. tximeta			yes	{}
. . factors	array		yes
. . extra	string	Extra params for tximeta
. deseq2			yes	{}
. . analyses	array		yes
. . transform			yes	{}
. . . method	string			rlog
. . . extra	string
enrichment			yes	{}
. padj_cutoff	number	Adjusted p-value cutoff to define significant genes for ORA.	yes	0.05
. targets	array	List of target gene symbols to search for in enriched pathways. Not yet implemented. TODO		[]
. clusterprofiler			yes	{}
. . gsea				{}
. . . gseGO			yes	{}
. . . . extra	string
. . . gseKEGG			yes	{}
. . . . extra	string
. . ora				{}
. . . enrichGO			yes	{}
. . . . extra	string
. . . enrichKEGG			yes	{}
. . . . extra	string
. . kegg_module				{}
. . . enabled	boolean			false
. . . enrichMKEGG			yes	{}
. . . . extra	string
. . . gseMKEGG			yes	{}
. . . . extra	string
. . wikipathways				{}
. . . enabled	boolean			false
. . . enrichWP			yes	{}
. . . . extra	string
. . . gseWP			yes	{}
. . . . extra	string
. msigdb		MSigDB (Molecular Signatures Database) configuration	yes
. . enabled	boolean			true
. . collections	array	List of MSigDB collections to analyze (H, C1, C2, C3, C4, C5, C6, C7, C8)	yes	[‘H’]
. . custom_gmt_files	array	List of paths to custom GMT files for ORA and GSEA	yes	[]
. . ora			yes	{}
. . . extra	string
. . gsea			yes	{}
. . . extra	string
. spia		SPIA (Signaling Pathway Impact Analysis) configuration	yes	{}
. . enabled	boolean			false
. . extra	string			beta = NULL, verbose = TRUE, plots = FALSE
. harmonizome		Harmonizome database configuration for tissue-specific gene sets	yes	{}
. . enabled	boolean			false
. . datasets	array	List of Harmonizome datasets and gene sets to analyze (see https://maayanlab.cloud/Harmonizome/)
. . ora			yes	{}
. . . extra	string
. . gsea			yes	{}
. . . extra	string
. annotationforge			yes	{}
. . version	string			0.1.0
. . author	string			firstname.lastname@institution.edu
. . extra	string			useSynonyms = TRUE
params			yes	{}
. fastqc				{}
. . memory	integer			1024
. . extra	string
. star_index				{}
. . sjdbOverhang	integer			149
. . extra	string
. star			yes	{}
. . extra	string			–outSAMtype BAM SortedByCoordinate –outSAMunmapped Within –outSAMattributes Standard –outFilterMultimapNmax 1 –outFilterScoreMinOverLread 0 –outFilterMatchNminOverLread 0 –alignIntronMin 1 –alignIntronMax 2500
. qualimap_rnaseq			yes	{}
. . enabled	boolean			true
. . counting_alg	string			proportional
. . sequencing_protocol	string			non-strand-specific
. . extra	string			–paired –java-mem-size=8G
. salmon_index			yes	{}
. . extra	string			-k 31
. salmon_quant			yes	{}
. . libtype	string			A
. . extra	string			–seqBias –posBias –writeUnmappedNames
. multiqc			yes	{}
. . extra	string	Leave out –force if you don’t want to automatically overwrite existing multiqc results on a re-run		–verbose –force
. sra_tools			yes	{}
. . vdb_config_ra_path	string			/repository/user/main/remote_access=true
. . subsample		Configuration for subsampling SRA data for testing purposes
. . . enabled	boolean	Whether to use subsampling instead of full download		false
. . . min_spot_id	integer	Minimum spot ID for SRA subsampling	yes	1
. . . max_spot_id	integer	Maximum spot ID for SRA subsampling	yes	100000

Linting and formatting

Formatting results

[DEBUG] 
[DEBUG] 
[DEBUG] 
[DEBUG] 
[DEBUG] 
[DEBUG] 
[DEBUG] In file "/tmp/tmp1_mhsazo/tucca-cellag-tucca-rna-seq-697b13d/workflow/rules/common.smk":  Formatted content is different from original
[DEBUG] 
[DEBUG] In file "/tmp/tmp1_mhsazo/tucca-cellag-tucca-rna-seq-697b13d/workflow/Snakefile":  Formatted content is different from original
[DEBUG] 
[DEBUG] 
[DEBUG] 
[DEBUG] 
[DEBUG] 
[DEBUG] 
[DEBUG] 
[DEBUG] 
[DEBUG] 
[INFO] 2 file(s) would be changed 😬
[INFO] 14 file(s) would be left unchanged 🎉

snakefmt version: 0.11.4