niekwit/crispr-screens
Snakemake workflow for CRISPR-Cas9 screen analysis
Overview
Topics: bioinformatics-pipeline crispr-screen-analysis snakemake-workflow
Latest release: v0.8.1, Last update: 2025-01-31
Linting: linting: failed, Formatting:formatting: failed
Deployment
Step 1: Install Snakemake and Snakedeploy
Snakemake and Snakedeploy are best installed via the Mamba package manager (a drop-in replacement for conda). If you have neither Conda nor Mamba, it is recommended to install Miniforge. More details regarding Mamba can be found here.
When using Mamba, run
mamba create -c conda-forge -c bioconda --name snakemake snakemake snakedeploy
to install both Snakemake and Snakedeploy in an isolated environment. For all following commands ensure that this environment is activated via
conda activate snakemake
Step 2: Deploy workflow
With Snakemake and Snakedeploy installed, the workflow can be deployed as follows. First, create an appropriate project working directory on your system and enter it:
mkdir -p path/to/project-workdir
cd path/to/project-workdir
In all following steps, we will assume that you are inside of that directory. Then run
snakedeploy deploy-workflow https://github.com/niekwit/crispr-screens . --tag v0.8.1
Snakedeploy will create two folders, workflow
and config
. The former contains the deployment of the chosen workflow as a Snakemake module, the latter contains configuration files which will be modified in the next step in order to configure the workflow to your needs.
Step 3: Configure workflow
To configure the workflow, adapt config/config.yml
to your needs following the instructions below.
Step 4: Run workflow
The deployment method is controlled using the --software-deployment-method
(short --sdm
) argument.
To run the workflow with automatic deployment of all required software via conda
/mamba
, use
snakemake --cores all --sdm conda
To run the workflow using apptainer
/singularity
, use
snakemake --cores all --sdm apptainer
To run the workflow using a combination of conda
and apptainer
/singularity
for software deployment, use
snakemake --cores all --sdm conda apptainer
Snakemake will automatically detect the main Snakefile
in the workflow
subfolder and execute the workflow module that has been defined by the deployment in step 2.
For further options such as cluster and cloud execution, see the docs.
Step 5: Generate report
After finalizing your data analysis, you can automatically generate an interactive visual HTML report for inspection of results together with parameters and code inside of the browser using
snakemake --report report.zip
Configuration
The following section is imported from the workflow’s config/README.md
.
Use config.yml to provide information about your experiment:
Under lib_info
information about the sgRNA library must be provided. If your library has a fixed sgRNA length, then provide this with sg_length. This way the reads in the raw data will be trimmed from the 3' end until the read length is equal to sg_length
Some libraries have variable sgRNA length. In this case provide a vector sequence under vector, so that this sequence is removed, instead of trimming to a fixed length.
Finally, with some sequencing strategies, the first base sequenced can be the same for all sgRNA sequences. In this case the base calling can be of very poor quality. By setting left_trim
to 1, one can remove the first 5' base for all reads in order to improve the quality of the read. This step will be performed before 3' end trimming or removal of the vector sequence.
Inside the resources folder, a fasta file should be placed that contains unique sgRNA names and sequences, which will be used to build an index for alignment using HISAT2.
This Snakemake workflow can be run without a fasta file, as long as a CSV file (also in the resources folder) is provided that contains the unique sgRNA sequences and corresponding gene names in separate columns. Under name_column
the column number of the gene names, and under sequence_column
the column number of the sequence column have to be set.
The number of mismatches allowed during sequence alignment can be set here. A maximum of 2 mismatches can be set.
With skip
one can skip statistical analyses with MAGeCK, and/or BAGEL2.
Any extra argument to be parsed to MAGeCK can be defined with extra_mageck_arguments
.
Normally MAGeCK builds the statistical model using all sgRNAs in the library. However, sometimes this is not optimal when many sgRNAs change, for example when using smaller libraries. In this case, the user can provide a file, with each gene name on a new line, that contain genes that should be used to build the model instead.
Under resources the computational requirements can be set. The CPU count will be used locally and on any HPC/cloud platform, while the time will only be relevant for the latter.
Linting and formatting
Linting results
Workflow version: v0.8.0
Wrapper version: v5.2.1
AssertionError in file /tmp/tmp4_vfeax0/niekwit-crispr-screens-e546432/workflow/scripts/general_functions.smk, line 90:
No fastq files (.fastq.gz) found in reads directory
File "/tmp/tmp4_vfeax0/niekwit-crispr-screens-e546432/workflow/Snakefile", line 31, in <module>
File "/tmp/tmp4_vfeax0/niekwit-crispr-screens-e546432/workflow/scripts/general_functions.smk", line 90, in sample_names
Formatting results
[DEBUG]
[DEBUG] In file "/tmp/tmp4_vfeax0/niekwit-crispr-screens-e546432/workflow/Snakefile": Formatted content is different from original
[DEBUG]
[DEBUG] In file "/tmp/tmp4_vfeax0/niekwit-crispr-screens-e546432/workflow/rules/mageck.smk": Formatted content is different from original
[DEBUG]
[DEBUG] In file "/tmp/tmp4_vfeax0/niekwit-crispr-screens-e546432/workflow/rules/bagel2.smk": Formatted content is different from original
[DEBUG]
[DEBUG] In file "/tmp/tmp4_vfeax0/niekwit-crispr-screens-e546432/workflow/rules/drugz.smk": Formatted content is different from original
[DEBUG]
[DEBUG] In file "/tmp/tmp4_vfeax0/niekwit-crispr-screens-e546432/workflow/rules/trim.smk": Formatted content is different from original
[DEBUG]
[DEBUG] In file "/tmp/tmp4_vfeax0/niekwit-crispr-screens-e546432/workflow/rules/qc.smk": Formatted content is different from original
[DEBUG]
[DEBUG] In file "/tmp/tmp4_vfeax0/niekwit-crispr-screens-e546432/workflow/rules/count.smk": Formatted content is different from original
[INFO] 7 file(s) would be changed 😬
snakefmt version: 0.10.2