snakemake-workflows/cellranger-multi
A Snakemake workflow for preprocessing single cell RNAseq (scRNA-seq) data with cellranger multi (Cell Ranger licensing requires a manual download of the software).
Overview
Latest release: v1.0.0, Last update: 2025-10-22
Linting: linting: passed, Formatting: formatting: passed
Deployment
Step 1: Install Snakemake and Snakedeploy
Snakemake and Snakedeploy are best installed via the Conda. It is recommended to install conda via Miniforge. Run
conda create -c conda-forge -c bioconda -c nodefaults --name snakemake snakemake snakedeploy
to install both Snakemake and Snakedeploy in an isolated environment. For all following commands ensure that this environment is activated via
conda activate snakemake
For other installation methods, refer to the Snakemake and Snakedeploy documentation.
Step 2: Deploy workflow
With Snakemake and Snakedeploy installed, the workflow can be deployed as follows. First, create an appropriate project working directory on your system and enter it:
mkdir -p path/to/project-workdir
cd path/to/project-workdir
In all following steps, we will assume that you are inside of that directory. Then run
snakedeploy deploy-workflow https://github.com/snakemake-workflows/cellranger-multi . --tag v1.0.0
Snakedeploy will create two folders, workflow and config. The former contains the deployment of the chosen workflow as a Snakemake module, the latter contains configuration files which will be modified in the next step in order to configure the workflow to your needs.
Step 3: Configure workflow
To configure the workflow, adapt config/config.yml to your needs following the instructions below.
Step 4: Run workflow
The deployment method is controlled using the --software-deployment-method (short --sdm) argument.
To run the workflow using apptainer/singularity, use
snakemake --cores all --sdm apptainer
To run the workflow using a combination of conda and apptainer/singularity for software deployment, use
snakemake --cores all --sdm conda apptainer
To run the workflow with automatic deployment of all required software via conda/mamba, use
snakemake --cores all --sdm conda
Snakemake will automatically detect the main Snakefile in the workflow subfolder and execute the workflow module that has been defined by the deployment in step 2.
For further options such as cluster and cloud execution, see the docs.
Step 5: Generate report
After finalizing your data analysis, you can automatically generate an interactive visual HTML report for inspection of results together with parameters and code inside of the browser using
snakemake --report report.zip
Configuration
The following section is imported from the workflow’s config/README.md.
Workflow overview
This workflow is a best-practice snakemake workflow for systematically running cellranger multi on one or more samples.
See the 10X documentation for choosing a pipeline to see whether this is the preprocessing you need.
If your assay setup suggests cellranger count, have a look at the standardised workflow for cellranger count instead.
The workflow is built using snakemake and consists of the following steps:
Link in files to a new file name that follows cellranger requirements.
Create a per-sample cellranger multi config CSV sheet.
Run
cellranger multi, parallelizing over biological samples.Create a snakemake report with the a Web Summary per biological sample.
Running the workflow
cellranger download
As a pre-requisite for running the workflow, you need to download the *.tar.gz file with the Cell Ranger executable from the Cell Ranger Download center:
https://www.10xgenomics.com/support/software/cell-ranger/downloads
Afterwards, set the environment variable CELLRANGER_TARBALL to the full path of this executable, for example:
To make this a permanently set environment variable for your user on the respective system, add the (adapted) line from above to your ~/.bashrc file and make sure this file is always loaded.
With this environment variable set, the workflow will automatically install cellranger into a conda environment that is then used for all cellranger steps.
So once your specific analysis has created this conda environment, the cellranger version will stay at the version specified at that time.
Should you ever want to update the cellranger version for an analysis, you will have to update the CELLRANGER_TARBALL environment variable and delete the conda environment, to ensure that it gets re-generated.
The conda environments are stored in the hidden .snakemake/conda/ folder.
You can usually identify the exact conda environment used by a rule from the .snakemake/logs/ files or the respective cluster system log files.
Search for the execution of the respective rule (cellranger_multi) and then look for Activating conda environment: right below.
You can then delete the respective file and directory under .snakemake/conda/ and rerun the workflow.
Input data
The sample sheet configures all the possible columns for the [libraries] section of the multi config CSV file:
sample |
feature_types |
read1 |
read2 |
lane_number |
|---|---|---|---|---|
sample1 |
Gene Expression |
../data/sample1_gex/sample1_gex.bwa.L001.read1.fastq.gz |
../data/sample1_gex/sample1_gex.bwa.L001.read2.fastq.gz |
1 |
sample1 |
VDJ-T |
../data/sample1_vdjt/sample1_vdjt.bwa.L003.read1.fastq.gz |
../data/sample1_vdjt/sample1_vdjt.bwa.L003.read2.fastq.gz |
1 |
sample2 |
Gene Expression |
../data/sample2_gex/sample2_gex.bwa.L001.read1.fastq.gz |
../data/sample2_gex/sample2_gex.bwa.L001.read2.fastq.gz |
1 |
sample2 |
Gene Expression |
../data/sample2_gex/sample2_gex.bwa.L002.read1.fastq.gz |
../data/sample2_gex/sample2_gex.bwa.L002.read2.fastq.gz |
2 |
For more details on these columns, refer to the 10X documentation for the [libraries] section of the multi config CSV file.
We also provide specific subsection links wherever available.
These are required columns:
sampleis an arbitrary name assigned to represent one biological sample. The same name should be used across all lanes used and all assays performed for that sample, grouping all sequencing data generated from that biological sample.feature_typescan be any of the values listed in thecellranger multidocumentation on multi config CSVs.read1andread2require file names with paths relative to the main workflow directory (the directory, where you run thesnakemakecommand). From these (and the optionallane_numbercolumn), the raw read data files are linked into the folder and file name structure that cellranger expects, and thefastq_idandfastqscolumns of the multi config CSV file are set up accordingly.
These are optional columns:
lane_numberis only necessary if a single sample is sequenced across multiple lanes. Usually, you will number lanes starting from 1 and only up to a single digit number of lanes. As we specify one pair of fastq files per row, thelane_numbercolumn also only contains a single lane number, as we have one pair of files per lane. For thelanescolumn in the final multi config CSV file, multiple lane numbers get parsed into the format1|2|3etc.physical_library_idis usually auto-detected, so just omit it if in doubt.subsample_rateis not usually needed.chemistryisautoper default and only applicable for Flex assays. If you think this applies to your setup, see thechemistryoptions in the 10X documentation.
Global analysis-level configuration
All global configuration settings for the whole analysis are specified in the config/config.yaml file.
This file is extensively commented to explain how to set which options.
You can delete any options you don’t need, or set them to an empty string ("").
The only required sections are those for the feature types present in the feature_types column of the sample sheet.
And the only required entry for a required section is usually the reference: path or file specification.
Linting and formatting
Linting results
All tests passed!
Formatting results
All tests passed!