niekwit/crispr-screens

Snakemake workflow for CRISPR-Cas9 screen analysis

Overview

Topics: bioinformatics-pipeline crispr-screen-analysis snakemake-workflow

Latest release: v0.8.1, Last update: 2025-01-31

Linting: linting: failed, Formatting:formatting: failed

Deployment

Step 1: Install Snakemake and Snakedeploy

Snakemake and Snakedeploy are best installed via the Mamba package manager (a drop-in replacement for conda). If you have neither Conda nor Mamba, it is recommended to install Miniforge. More details regarding Mamba can be found here.

When using Mamba, run

mamba create -c conda-forge -c bioconda --name snakemake snakemake snakedeploy

to install both Snakemake and Snakedeploy in an isolated environment. For all following commands ensure that this environment is activated via

conda activate snakemake

Step 2: Deploy workflow

With Snakemake and Snakedeploy installed, the workflow can be deployed as follows. First, create an appropriate project working directory on your system and enter it:

mkdir -p path/to/project-workdir
cd path/to/project-workdir

In all following steps, we will assume that you are inside of that directory. Then run

snakedeploy deploy-workflow https://github.com/niekwit/crispr-screens . --tag v0.8.1

Snakedeploy will create two folders, workflow and config. The former contains the deployment of the chosen workflow as a Snakemake module, the latter contains configuration files which will be modified in the next step in order to configure the workflow to your needs.

Step 3: Configure workflow

To configure the workflow, adapt config/config.yml to your needs following the instructions below.

Step 4: Run workflow

The deployment method is controlled using the --software-deployment-method (short --sdm) argument.

To run the workflow with automatic deployment of all required software via conda/mamba, use

snakemake --cores all --sdm conda

To run the workflow using apptainer/singularity, use

snakemake --cores all --sdm apptainer

To run the workflow using a combination of conda and apptainer/singularity for software deployment, use

snakemake --cores all --sdm conda apptainer

Snakemake will automatically detect the main Snakefile in the workflow subfolder and execute the workflow module that has been defined by the deployment in step 2.

For further options such as cluster and cloud execution, see the docs.

Step 5: Generate report

After finalizing your data analysis, you can automatically generate an interactive visual HTML report for inspection of results together with parameters and code inside of the browser using

snakemake --report report.zip

Configuration

The following section is imported from the workflow’s config/README.md.

CONFIGURATION

config.yml

Use config.yml to provide information about your experiment:

lib_info

Under lib_info information about the sgRNA library must be provided. If your library has a fixed sgRNA length, then provide this with sg_length. This way the reads in the raw data will be trimmed from the 3' end until the read length is equal to sg_length

Some libraries have variable sgRNA length. In this case provide a vector sequence under vector, so that this sequence is removed, instead of trimming to a fixed length.

Finally, with some sequencing strategies, the first base sequenced can be the same for all sgRNA sequences. In this case the base calling can be of very poor quality. By setting left_trim to 1, one can remove the first 5' base for all reads in order to improve the quality of the read. This step will be performed before 3' end trimming or removal of the vector sequence.

csv

Inside the resources folder, a fasta file should be placed that contains unique sgRNA names and sequences, which will be used to build an index for alignment using HISAT2.

This Snakemake workflow can be run without a fasta file, as long as a CSV file (also in the resources folder) is provided that contains the unique sgRNA sequences and corresponding gene names in separate columns. Under name_column the column number of the gene names, and under sequence_column the column number of the sequence column have to be set.

mismatch

The number of mismatches allowed during sequence alignment can be set here. A maximum of 2 mismatches can be set.

stats

With skip one can skip statistical analyses with MAGeCK, and/or BAGEL2.

Any extra argument to be parsed to MAGeCK can be defined with extra_mageck_arguments.

Normally MAGeCK builds the statistical model using all sgRNAs in the library. However, sometimes this is not optimal when many sgRNAs change, for example when using smaller libraries. In this case, the user can provide a file, with each gene name on a new line, that contain genes that should be used to build the model instead.

resources

Under resources the computational requirements can be set. The CPU count will be used locally and on any HPC/cloud platform, while the time will only be relevant for the latter.

Linting and formatting

Linting results

Workflow version: v0.8.0
Wrapper version: v5.2.1
AssertionError in file /tmp/tmp4_vfeax0/niekwit-crispr-screens-e546432/workflow/scripts/general_functions.smk, line 90:
No fastq files (.fastq.gz) found in reads directory
  File "/tmp/tmp4_vfeax0/niekwit-crispr-screens-e546432/workflow/Snakefile", line 31, in <module>
  File "/tmp/tmp4_vfeax0/niekwit-crispr-screens-e546432/workflow/scripts/general_functions.smk", line 90, in sample_names

Formatting results

[DEBUG] 
[DEBUG] In file "/tmp/tmp4_vfeax0/niekwit-crispr-screens-e546432/workflow/Snakefile":  Formatted content is different from original
[DEBUG] 
[DEBUG] In file "/tmp/tmp4_vfeax0/niekwit-crispr-screens-e546432/workflow/rules/mageck.smk":  Formatted content is different from original
[DEBUG] 
[DEBUG] In file "/tmp/tmp4_vfeax0/niekwit-crispr-screens-e546432/workflow/rules/bagel2.smk":  Formatted content is different from original
[DEBUG] 
[DEBUG] In file "/tmp/tmp4_vfeax0/niekwit-crispr-screens-e546432/workflow/rules/drugz.smk":  Formatted content is different from original
[DEBUG] 
[DEBUG] In file "/tmp/tmp4_vfeax0/niekwit-crispr-screens-e546432/workflow/rules/trim.smk":  Formatted content is different from original
[DEBUG] 
[DEBUG] In file "/tmp/tmp4_vfeax0/niekwit-crispr-screens-e546432/workflow/rules/qc.smk":  Formatted content is different from original
[DEBUG] 
[DEBUG] In file "/tmp/tmp4_vfeax0/niekwit-crispr-screens-e546432/workflow/rules/count.smk":  Formatted content is different from original
[INFO] 7 file(s) would be changed 😬

snakefmt version: 0.10.2