baerlachlan/smk-rnaseq-counts

Snakemake workflow for estimating read counts from RNA-seq data

Overview

Topics:

Latest release: v1.2.7, Last update: 2025-06-19

Linting: linting: failed, Formatting: formatting: failed

Deployment

Step 1: Install Snakemake and Snakedeploy

Snakemake and Snakedeploy are best installed via the Mamba package manager (a drop-in replacement for conda). If you have neither Conda nor Mamba, it is recommended to install Miniforge. More details regarding Mamba can be found here.

When using Mamba, run

mamba create -c conda-forge -c bioconda --name snakemake snakemake snakedeploy

to install both Snakemake and Snakedeploy in an isolated environment. For all following commands ensure that this environment is activated via

conda activate snakemake

Step 2: Deploy workflow

With Snakemake and Snakedeploy installed, the workflow can be deployed as follows. First, create an appropriate project working directory on your system and enter it:

mkdir -p path/to/project-workdir
cd path/to/project-workdir

In all following steps, we will assume that you are inside of that directory. Then run

snakedeploy deploy-workflow https://github.com/baerlachlan/smk-rnaseq-counts . --tag v1.2.7

Snakedeploy will create two folders, workflow and config. The former contains the deployment of the chosen workflow as a Snakemake module, the latter contains configuration files which will be modified in the next step in order to configure the workflow to your needs.

Step 3: Configure workflow

To configure the workflow, adapt config/config.yml to your needs following the instructions below.

Step 4: Run workflow

The deployment method is controlled using the --software-deployment-method (short --sdm) argument.

To run the workflow with automatic deployment of all required software via conda/mamba, use

snakemake --cores all --sdm conda

Snakemake will automatically detect the main Snakefile in the workflow subfolder and execute the workflow module that has been defined by the deployment in step 2.

For further options such as cluster and cloud execution, see the docs.

Step 5: Generate report

After finalizing your data analysis, you can automatically generate an interactive visual HTML report for inspection of results together with parameters and code inside of the browser using

snakemake --report report.zip

Configuration

The following section is imported from the workflow’s config/README.md.

Configuration

Workflow config

The workflow requires configuration by modification of config/config.yaml. Follow the explanations provided as comments in the file.

Sample & unit config

The configuration of samples and units is specified as tab-separated value (.tsv) files. Each .tsv requires specific columns (see below), but extra columns may be present (however, will not be used).

samples.tsv

The default path for the sample sheet is config/samples.tsv. This may be changed via configuration in config/config.yaml.

samples.tsv requires only one column named sample, which contains the desired names of the samples. Sample names must be unique, corresponding to a physical sample. Biological and technical replicates should be specified as separate samples.

units.tsv

The default path for the unit sheet is config/units.tsv. This may be changed via configuration in config/config.yaml.

units.tsv requires four columns, named sample, unit, fq1 and fq2. Each row of the units sheet corresponds to a single sequencing unit. Therefore, for each sample specified in samples.tsv, one or more sequencing units should be present. unit values must be unique within each sample. A common example of an experiment with multiple sequencing units is a sample split across several runs/lanes.

For each unit, the respective path to FASTQ files must be specified in the fq1 and fq2 columns. Both columns must exist, however, the fq2 column may be left empty in the case of single-end sequencing experiments. This is how one specifies whether single- or paired-end rules are run by the workflow.

Linting and formatting

Linting results

  1Using workflow specific profile workflow/profiles/default for setting default command line arguments.
  2Lints for rule genome_get (line 1, /tmp/tmph4ttsjyw/baerlachlan-smk-rnaseq-counts-72c4bc8/workflow/rules/refs.smk):
  3    * No log directive defined:
  4      Without a log directive, all output will be printed to the terminal. In
  5      distributed environments, this means that errors are harder to discover.
  6      In local environments, output of concurrent jobs will be mixed and become
  7      unreadable.
  8      Also see:
  9      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
 10
 11Lints for rule transcriptome_get (line 13, /tmp/tmph4ttsjyw/baerlachlan-smk-rnaseq-counts-72c4bc8/workflow/rules/refs.smk):
 12    * No log directive defined:
 13      Without a log directive, all output will be printed to the terminal. In
 14      distributed environments, this means that errors are harder to discover.
 15      In local environments, output of concurrent jobs will be mixed and become
 16      unreadable.
 17      Also see:
 18      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
 19
 20Lints for rule annotation_get (line 25, /tmp/tmph4ttsjyw/baerlachlan-smk-rnaseq-counts-72c4bc8/workflow/rules/refs.smk):
 21    * No log directive defined:
 22      Without a log directive, all output will be printed to the terminal. In
 23      distributed environments, this means that errors are harder to discover.
 24      In local environments, output of concurrent jobs will be mixed and become
 25      unreadable.
 26      Also see:
 27      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
 28
 29Lints for rule star_index (line 37, /tmp/tmph4ttsjyw/baerlachlan-smk-rnaseq-counts-72c4bc8/workflow/rules/refs.smk):
 30    * No log directive defined:
 31      Without a log directive, all output will be printed to the terminal. In
 32      distributed environments, this means that errors are harder to discover.
 33      In local environments, output of concurrent jobs will be mixed and become
 34      unreadable.
 35      Also see:
 36      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
 37
 38Lints for rule salmon_decoy (line 50, /tmp/tmph4ttsjyw/baerlachlan-smk-rnaseq-counts-72c4bc8/workflow/rules/refs.smk):
 39    * No log directive defined:
 40      Without a log directive, all output will be printed to the terminal. In
 41      distributed environments, this means that errors are harder to discover.
 42      In local environments, output of concurrent jobs will be mixed and become
 43      unreadable.
 44      Also see:
 45      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
 46
 47Lints for rule salmon_index (line 61, /tmp/tmph4ttsjyw/baerlachlan-smk-rnaseq-counts-72c4bc8/workflow/rules/refs.smk):
 48    * No log directive defined:
 49      Without a log directive, all output will be printed to the terminal. In
 50      distributed environments, this means that errors are harder to discover.
 51      In local environments, output of concurrent jobs will be mixed and become
 52      unreadable.
 53      Also see:
 54      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
 55
 56Lints for rule fastqc_raw (line 1, /tmp/tmph4ttsjyw/baerlachlan-smk-rnaseq-counts-72c4bc8/workflow/rules/fastqc.smk):
 57    * No log directive defined:
 58      Without a log directive, all output will be printed to the terminal. In
 59      distributed environments, this means that errors are harder to discover.
 60      In local environments, output of concurrent jobs will be mixed and become
 61      unreadable.
 62      Also see:
 63      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
 64
 65Lints for rule fastqc_trim (line 13, /tmp/tmph4ttsjyw/baerlachlan-smk-rnaseq-counts-72c4bc8/workflow/rules/fastqc.smk):
 66    * No log directive defined:
 67      Without a log directive, all output will be printed to the terminal. In
 68      distributed environments, this means that errors are harder to discover.
 69      In local environments, output of concurrent jobs will be mixed and become
 70      unreadable.
 71      Also see:
 72      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
 73
 74Lints for rule fastqc_align (line 25, /tmp/tmph4ttsjyw/baerlachlan-smk-rnaseq-counts-72c4bc8/workflow/rules/fastqc.smk):
 75    * No log directive defined:
 76      Without a log directive, all output will be printed to the terminal. In
 77      distributed environments, this means that errors are harder to discover.
 78      In local environments, output of concurrent jobs will be mixed and become
 79      unreadable.
 80      Also see:
 81      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
 82
 83Lints for rule trim_se (line 1, /tmp/tmph4ttsjyw/baerlachlan-smk-rnaseq-counts-72c4bc8/workflow/rules/trim.smk):
 84    * No log directive defined:
 85      Without a log directive, all output will be printed to the terminal. In
 86      distributed environments, this means that errors are harder to discover.
 87      In local environments, output of concurrent jobs will be mixed and become
 88      unreadable.
 89      Also see:
 90      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
 91
 92Lints for rule trim_pe (line 14, /tmp/tmph4ttsjyw/baerlachlan-smk-rnaseq-counts-72c4bc8/workflow/rules/trim.smk):
 93    * No log directive defined:
 94      Without a log directive, all output will be printed to the terminal. In
 95      distributed environments, this means that errors are harder to discover.
 96      In local environments, output of concurrent jobs will be mixed and become
 97      unreadable.
 98      Also see:
 99      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
100
101Lints for rule merge (line 1, /tmp/tmph4ttsjyw/baerlachlan-smk-rnaseq-counts-72c4bc8/workflow/rules/merge.smk):
102    * No log directive defined:
103      Without a log directive, all output will be printed to the terminal. In
104      distributed environments, this means that errors are harder to discover.
105      In local environments, output of concurrent jobs will be mixed and become
106      unreadable.
107      Also see:
108      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
109    * Specify a conda environment or container for each rule.:
110      This way, the used software for each specific step is documented, and the
111      workflow can be executed on any machine without prerequisites.
112      Also see:
113      https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#integrated-package-management
114      https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#running-jobs-in-containers
115
116Lints for rule align (line 1, /tmp/tmph4ttsjyw/baerlachlan-smk-rnaseq-counts-72c4bc8/workflow/rules/align.smk):
117    * No log directive defined:
118      Without a log directive, all output will be printed to the terminal. In
119      distributed environments, this means that errors are harder to discover.
120      In local environments, output of concurrent jobs will be mixed and become
121      unreadable.
122      Also see:
123      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
124
125Lints for rule align_index (line 15, /tmp/tmph4ttsjyw/baerlachlan-smk-rnaseq-counts-72c4bc8/workflow/rules/align.smk):
126    * No log directive defined:
127      Without a log directive, all output will be printed to the terminal. In
128      distributed environments, this means that errors are harder to discover.
129      In local environments, output of concurrent jobs will be mixed and become
130      unreadable.
131      Also see:
132      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
133
134Lints for rule deduplicate (line 1, /tmp/tmph4ttsjyw/baerlachlan-smk-rnaseq-counts-72c4bc8/workflow/rules/deduplicate.smk):
135    * No log directive defined:
136      Without a log directive, all output will be printed to the terminal. In
137      distributed environments, this means that errors are harder to discover.
138      In local environments, output of concurrent jobs will be mixed and become
139      unreadable.
140      Also see:
141      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
142
143Lints for rule deduplicate_index (line 18, /tmp/tmph4ttsjyw/baerlachlan-smk-rnaseq-counts-72c4bc8/workflow/rules/deduplicate.smk):
144    * No log directive defined:
145      Without a log directive, all output will be printed to the terminal. In
146      distributed environments, this means that errors are harder to discover.
147      In local environments, output of concurrent jobs will be mixed and become
148      unreadable.
149      Also see:
150      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
151
152Lints for rule featureCounts_s0 (line 2, /tmp/tmph4ttsjyw/baerlachlan-smk-rnaseq-counts-72c4bc8/workflow/rules/featureCounts.smk):
153    * No log directive defined:
154      Without a log directive, all output will be printed to the terminal. In
155      distributed environments, this means that errors are harder to discover.
156      In local environments, output of concurrent jobs will be mixed and become
157      unreadable.
158      Also see:
159      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
160
161Lints for rule featureCounts_s1 (line 20, /tmp/tmph4ttsjyw/baerlachlan-smk-rnaseq-counts-72c4bc8/workflow/rules/featureCounts.smk):
162    * No log directive defined:
163      Without a log directive, all output will be printed to the terminal. In
164      distributed environments, this means that errors are harder to discover.
165      In local environments, output of concurrent jobs will be mixed and become
166      unreadable.
167      Also see:
168      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
169
170Lints for rule featureCounts_s2 (line 38, /tmp/tmph4ttsjyw/baerlachlan-smk-rnaseq-counts-72c4bc8/workflow/rules/featureCounts.smk):
171    * No log directive defined:
172      Without a log directive, all output will be printed to the terminal. In
173      distributed environments, this means that errors are harder to discover.
174      In local environments, output of concurrent jobs will be mixed and become
175      unreadable.
176      Also see:
177      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
178
179Lints for rule salmon_quant (line 1, /tmp/tmph4ttsjyw/baerlachlan-smk-rnaseq-counts-72c4bc8/workflow/rules/salmon.smk):
180    * No log directive defined:
181      Without a log directive, all output will be printed to the terminal. In
182      distributed environments, this means that errors are harder to discover.
183      In local environments, output of concurrent jobs will be mixed and become
184      unreadable.
185      Also see:
186      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files

Formatting results

 1[DEBUG] 
 2[DEBUG] 
 3[DEBUG] In file "/tmp/tmph4ttsjyw/baerlachlan-smk-rnaseq-counts-72c4bc8/workflow/rules/deduplicate.smk":  Formatted content is different from original
 4[DEBUG] 
 5[DEBUG] 
 6[DEBUG] 
 7[DEBUG] 
 8[DEBUG] 
 9[DEBUG] 
10[DEBUG] In file "/tmp/tmph4ttsjyw/baerlachlan-smk-rnaseq-counts-72c4bc8/workflow/rules/common.smk":  Formatted content is different from original
11[DEBUG] 
12[DEBUG] In file "/tmp/tmph4ttsjyw/baerlachlan-smk-rnaseq-counts-72c4bc8/workflow/rules/align.smk":  Formatted content is different from original
13[DEBUG] 
14[INFO] 3 file(s) would be changed 😬
15[INFO] 7 file(s) would be left unchanged 🎉
16
17snakefmt version: 0.11.0