snakemake-workflows/cellranger-multi

A Snakemake workflow for preprocessing single cell RNAseq (scRNA-seq) data with cellranger multi (Cell Ranger licensing requires a manual download of the software).

Overview

Latest release: v1.0.0, Last update: 2025-10-22

Linting: linting: passed, Formatting: formatting: passed

Deployment

Step 1: Install Snakemake and Snakedeploy

Snakemake and Snakedeploy are best installed via the Conda. It is recommended to install conda via Miniforge. Run

conda create -c conda-forge -c bioconda -c nodefaults --name snakemake snakemake snakedeploy

to install both Snakemake and Snakedeploy in an isolated environment. For all following commands ensure that this environment is activated via

conda activate snakemake

For other installation methods, refer to the Snakemake and Snakedeploy documentation.

Step 2: Deploy workflow

With Snakemake and Snakedeploy installed, the workflow can be deployed as follows. First, create an appropriate project working directory on your system and enter it:

mkdir -p path/to/project-workdir
cd path/to/project-workdir

In all following steps, we will assume that you are inside of that directory. Then run

snakedeploy deploy-workflow https://github.com/snakemake-workflows/cellranger-multi . --tag v1.0.0

Snakedeploy will create two folders, workflow and config. The former contains the deployment of the chosen workflow as a Snakemake module, the latter contains configuration files which will be modified in the next step in order to configure the workflow to your needs.

Step 3: Configure workflow

To configure the workflow, adapt config/config.yml to your needs following the instructions below.

Step 4: Run workflow

The deployment method is controlled using the --software-deployment-method (short --sdm) argument.

To run the workflow using apptainer/singularity, use

snakemake --cores all --sdm apptainer

To run the workflow using a combination of conda and apptainer/singularity for software deployment, use

snakemake --cores all --sdm conda apptainer

To run the workflow with automatic deployment of all required software via conda/mamba, use

snakemake --cores all --sdm conda

Snakemake will automatically detect the main Snakefile in the workflow subfolder and execute the workflow module that has been defined by the deployment in step 2.

For further options such as cluster and cloud execution, see the docs.

Step 5: Generate report

After finalizing your data analysis, you can automatically generate an interactive visual HTML report for inspection of results together with parameters and code inside of the browser using

snakemake --report report.zip

Configuration

The following section is imported from the workflow’s config/README.md.

Workflow overview

This workflow is a best-practice snakemake workflow for systematically running cellranger multi on one or more samples. See the 10X documentation for choosing a pipeline to see whether this is the preprocessing you need. If your assay setup suggests cellranger count, have a look at the standardised workflow for cellranger count instead.

The workflow is built using snakemake and consists of the following steps:

  1. Link in files to a new file name that follows cellranger requirements.

  2. Create a per-sample cellranger multi config CSV sheet.

  3. Run cellranger multi, parallelizing over biological samples.

  4. Create a snakemake report with the a Web Summary per biological sample.

Running the workflow

cellranger download

As a pre-requisite for running the workflow, you need to download the *.tar.gz file with the Cell Ranger executable from the Cell Ranger Download center: https://www.10xgenomics.com/support/software/cell-ranger/downloads

Afterwards, set the environment variable CELLRANGER_TARBALL to the full path of this executable, for example:

To make this a permanently set environment variable for your user on the respective system, add the (adapted) line from above to your ~/.bashrc file and make sure this file is always loaded.

With this environment variable set, the workflow will automatically install cellranger into a conda environment that is then used for all cellranger steps. So once your specific analysis has created this conda environment, the cellranger version will stay at the version specified at that time.

Should you ever want to update the cellranger version for an analysis, you will have to update the CELLRANGER_TARBALL environment variable and delete the conda environment, to ensure that it gets re-generated. The conda environments are stored in the hidden .snakemake/conda/ folder. You can usually identify the exact conda environment used by a rule from the .snakemake/logs/ files or the respective cluster system log files. Search for the execution of the respective rule (cellranger_multi) and then look for Activating conda environment: right below. You can then delete the respective file and directory under .snakemake/conda/ and rerun the workflow.

Input data

The sample sheet configures all the possible columns for the [libraries] section of the multi config CSV file:

sample

feature_types

read1

read2

lane_number

sample1

Gene Expression

../data/sample1_gex/sample1_gex.bwa.L001.read1.fastq.gz

../data/sample1_gex/sample1_gex.bwa.L001.read2.fastq.gz

1

sample1

VDJ-T

../data/sample1_vdjt/sample1_vdjt.bwa.L003.read1.fastq.gz

../data/sample1_vdjt/sample1_vdjt.bwa.L003.read2.fastq.gz

1

sample2

Gene Expression

../data/sample2_gex/sample2_gex.bwa.L001.read1.fastq.gz

../data/sample2_gex/sample2_gex.bwa.L001.read2.fastq.gz

1

sample2

Gene Expression

../data/sample2_gex/sample2_gex.bwa.L002.read1.fastq.gz

../data/sample2_gex/sample2_gex.bwa.L002.read2.fastq.gz

2

For more details on these columns, refer to the 10X documentation for the [libraries] section of the multi config CSV file. We also provide specific subsection links wherever available.

These are required columns:

  • sample is an arbitrary name assigned to represent one biological sample. The same name should be used across all lanes used and all assays performed for that sample, grouping all sequencing data generated from that biological sample.

  • feature_types can be any of the values listed in the cellranger multi documentation on multi config CSVs.

  • read1 and read2 require file names with paths relative to the main workflow directory (the directory, where you run the snakemake command). From these (and the optional lane_number column), the raw read data files are linked into the folder and file name structure that cellranger expects, and the fastq_id and fastqs columns of the multi config CSV file are set up accordingly.

These are optional columns:

  • lane_number is only necessary if a single sample is sequenced across multiple lanes. Usually, you will number lanes starting from 1 and only up to a single digit number of lanes. As we specify one pair of fastq files per row, the lane_number column also only contains a single lane number, as we have one pair of files per lane. For the lanes column in the final multi config CSV file, multiple lane numbers get parsed into the format 1|2|3 etc.

  • physical_library_id is usually auto-detected, so just omit it if in doubt.

  • subsample_rate is not usually needed.

  • chemistry is auto per default and only applicable for Flex assays. If you think this applies to your setup, see the chemistry options in the 10X documentation.

Global analysis-level configuration

All global configuration settings for the whole analysis are specified in the config/config.yaml file. This file is extensively commented to explain how to set which options. You can delete any options you don’t need, or set them to an empty string (""). The only required sections are those for the feature types present in the feature_types column of the sample sheet. And the only required entry for a required section is usually the reference: path or file specification.

Linting and formatting

Linting results

All tests passed!

Formatting results

All tests passed!