epigen/dea_limma

A Snakemake workflow and MrBiomics module for performing and visualizing differential (expression) analyses (DEA) on NGS data powered by the R package limma.

Overview

Topics: atac-seq bioinformatics biomedical-data-science chip-seq differential-expression-analysis limma limma-trend limma-voom rna-seq snakemake visualization volcano-plot workflow scrna-seq statistics r

Latest release: v2.0.1, Last update: 2025-03-06

Linting: linting: failed, Formatting:formatting: failed

Deployment

Step 1: Install Snakemake and Snakedeploy

Snakemake and Snakedeploy are best installed via the Mamba package manager (a drop-in replacement for conda). If you have neither Conda nor Mamba, it is recommended to install Miniforge. More details regarding Mamba can be found here.

When using Mamba, run

mamba create -c conda-forge -c bioconda --name snakemake snakemake snakedeploy

to install both Snakemake and Snakedeploy in an isolated environment. For all following commands ensure that this environment is activated via

conda activate snakemake

Step 2: Deploy workflow

With Snakemake and Snakedeploy installed, the workflow can be deployed as follows. First, create an appropriate project working directory on your system and enter it:

mkdir -p path/to/project-workdir
cd path/to/project-workdir

In all following steps, we will assume that you are inside of that directory. Then run

snakedeploy deploy-workflow https://github.com/epigen/dea_limma . --tag v2.0.1

Snakedeploy will create two folders, workflow and config. The former contains the deployment of the chosen workflow as a Snakemake module, the latter contains configuration files which will be modified in the next step in order to configure the workflow to your needs.

Step 3: Configure workflow

To configure the workflow, adapt config/config.yml to your needs following the instructions below.

Step 4: Run workflow

The deployment method is controlled using the --software-deployment-method (short --sdm) argument.

To run the workflow with automatic deployment of all required software via conda/mamba, use

snakemake --cores all --sdm conda

Snakemake will automatically detect the main Snakefile in the workflow subfolder and execute the workflow module that has been defined by the deployment in step 2.

For further options such as cluster and cloud execution, see the docs.

Step 5: Generate report

After finalizing your data analysis, you can automatically generate an interactive visual HTML report for inspection of results together with parameters and code inside of the browser using

snakemake --report report.zip

Configuration

The following section is imported from the workflow’s config/README.md.

Configuration

You need one configuration file and one annotation file to run the complete workflow. Additionally, you can provide a feature annotation file in the project configuration (e.g., for plotting gene symbols instead of ensembl terms). If in doubt read the comments in the config and/or try the default values.

  • project configuration (config/config.yaml): configures the analyses to be performed and is different for every project/dataset.
  • annotation (annotation): CSV file with one row per analysis and consisting of 10 mandatory columns
    • name: name of the dataset/analysis (tip: keep it short, but descriptive, distinctive and unique)
    • data: Absolute path to the input data as CSV file as feature by sample table (eg RNA count matrix) that has already been quality controlled (eg bad samples removed) and filtered for relevant features (eg only expressed genes). The first column has to contain the features and the first row the sample-names.
    • metadata: Absolute path to the metadata as CSV file for the required analysis (ie every variable in the formula and opyional blocking variable needs to have a corresponding column). The first column has to be the sample name. The metadata file has to be R compatible (eg column names should not start with a number or contain colons).
    • formula: A string that will be converted to a formula in R (eg ~ treatment + batch).
    • block_var: Flag to indicate which variable (present in the metadata as column) should be used for the blocking feature (see README > Features) or if it should be skipped (0).
    • comparisons: Variable names contained in the formula (and metadata) which coefficient's you are interested in, separated by '|' (eg treatment|batch). Results of all derived groups (eg treatmentLPS) containing one of the comparisons will be returned.
    • calcNormFactors_method: Flag to indicate if edgeR:calcNormFactors function should be used specifing the parameter "method" (eg none or TMM) or should be skipped because the input data is already log-normalized (0).
    • voom: Flag to indicate if voom function should be used (1) or not (0). Note: Should be 0 if data is already log-normalized and/or limma-trend will be used.
    • eBayes: Flag to indicate if eBayes function should be used (1) or not (0). Note: Skipping eBayes (0) will lead to the use of ordinary t-statistic with topTable and is not recommended by the limma author Gordon Smyth, the B-statistics (log-odds) are still determined using eBayes, assuming they will not be used downstream. Make sure you know what you are doing.
    • limma_trend: Flag to indicate if limma-trend should be used (1) (ie sets limma::eBayes parameter trend=TRUE), or not (0). Note: Make sure to activate the required eBayes function (=1) and deactivate voom (=0) if you use limma-trend. Using voom and limma-trend makes no sense, but is not forbiden by the workflow.

Set workflow-specific resources or command line arguments (CLI) in the workflow profile workflow/profiles/default.config.yaml, which supersedes global Snakemake profiles.

Common configuration scenarios

  • standard limma-voom workflow with raw counts as input data (see "Differential expression: voom" in the limma userguide)
    • calcNormFactors_method: none (or other normalization method e.g., TMM, to be considered by voom)
    • voom: 1
    • eBayes: 1
    • limma_trend: 0
  • standard limma-trend workflow with raw counts as input data (see "Differential expression: limma-trend" in the limma userguide)
    • calcNormFactors_method: none (or other normalization method e.g., TMM)
    • voom: 0
    • eBayes: 1
    • limma_trend: 1
  • limma-trend workflow for log-normalized input data
    • calcNormFactors_method: 0 (thereby skipping any normalization or voom related actions)
    • voom: 0
    • eBayes: 1
    • limma_trend: 1
  • for more in-depth understanding check out the commented code: limma.R

Linting and formatting

Linting results

Using workflow specific profile workflow/profiles/default for setting default command line arguments.
FileNotFoundError in file /tmp/tmpcna700u9/epigen-dea_limma-f0c40b1/workflow/Snakefile, line 24:
[Errno 2] No such file or directory: '/path/to/DataSet_dea_limma_annotation.csv'
  File "/tmp/tmpcna700u9/epigen-dea_limma-f0c40b1/workflow/Snakefile", line 24, in <module>
  File "/home/runner/micromamba/envs/snakemake-workflow-catalog/lib/python3.12/site-packages/pandas/io/parsers/readers.py", line 1026, in read_csv
  File "/home/runner/micromamba/envs/snakemake-workflow-catalog/lib/python3.12/site-packages/pandas/io/parsers/readers.py", line 620, in _read
  File "/home/runner/micromamba/envs/snakemake-workflow-catalog/lib/python3.12/site-packages/pandas/io/parsers/readers.py", line 1620, in __init__
  File "/home/runner/micromamba/envs/snakemake-workflow-catalog/lib/python3.12/site-packages/pandas/io/parsers/readers.py", line 1880, in _make_engine
  File "/home/runner/micromamba/envs/snakemake-workflow-catalog/lib/python3.12/site-packages/pandas/io/common.py", line 873, in get_handle

Formatting results

[DEBUG] 
[DEBUG] In file "/tmp/tmpcna700u9/epigen-dea_limma-f0c40b1/workflow/rules/envs_export.smk":  Formatted content is different from original
[DEBUG] 
[DEBUG] In file "/tmp/tmpcna700u9/epigen-dea_limma-f0c40b1/workflow/Snakefile":  Formatted content is different from original
[DEBUG] 
[DEBUG] In file "/tmp/tmpcna700u9/epigen-dea_limma-f0c40b1/workflow/rules/dea.smk":  Formatted content is different from original
[DEBUG] 
[DEBUG] In file "/tmp/tmpcna700u9/epigen-dea_limma-f0c40b1/workflow/rules/visualize.smk":  Formatted content is different from original
[DEBUG] 
[DEBUG] In file "/tmp/tmpcna700u9/epigen-dea_limma-f0c40b1/workflow/rules/common.smk":  Formatted content is different from original
[INFO] 5 file(s) would be changed 😬

snakefmt version: 0.10.2