WestGermanGenomeCenter/circrna_detection

circs_snake : a snakemake-based circRNA detection workflow

Overview

Topics: circular rna rna-seq rna-seq-pipeline rnaseq-pipeline

Latest release: None, Last update: 2021-09-30

Linting: linting: failed, Formatting: formatting: failed

Deployment

Step 1: Install Snakemake and Snakedeploy

Snakemake and Snakedeploy are best installed via the Mamba package manager (a drop-in replacement for conda). If you have neither Conda nor Mamba, it is recommended to install Miniforge. More details regarding Mamba can be found here.

When using Mamba, run

mamba create -c conda-forge -c bioconda --name snakemake snakemake snakedeploy

to install both Snakemake and Snakedeploy in an isolated environment. For all following commands ensure that this environment is activated via

conda activate snakemake

Step 2: Deploy workflow

With Snakemake and Snakedeploy installed, the workflow can be deployed as follows. First, create an appropriate project working directory on your system and enter it:

mkdir -p path/to/project-workdir
cd path/to/project-workdir

In all following steps, we will assume that you are inside of that directory. Then run

snakedeploy deploy-workflow https://github.com/WestGermanGenomeCenter/circrna_detection . --tag None

Snakedeploy will create two folders, workflow and config. The former contains the deployment of the chosen workflow as a Snakemake module, the latter contains configuration files which will be modified in the next step in order to configure the workflow to your needs.

Step 3: Configure workflow

To configure the workflow, adapt config/config.yml to your needs following the instructions below.

Step 4: Run workflow

The deployment method is controlled using the --software-deployment-method (short --sdm) argument.

To run the workflow with automatic deployment of all required software via conda/mamba, use

snakemake --cores all --sdm conda

To run the workflow using apptainer/singularity, use

snakemake --cores all --sdm apptainer

To run the workflow using a combination of conda and apptainer/singularity for software deployment, use

snakemake --cores all --sdm conda apptainer

Snakemake will automatically detect the main Snakefile in the workflow subfolder and execute the workflow module that has been defined by the deployment in step 2.

For further options such as cluster and cloud execution, see the docs.

Step 5: Generate report

After finalizing your data analysis, you can automatically generate an interactive visual HTML report for inspection of results together with parameters and code inside of the browser using

snakemake --report report.zip

Configuration

The following section is imported from the workflow’s config/README.md.

users manual to circs_snake

circs_snake is a multi-pipeline circRNA detection workflow from RNASeq data.

This readme is meant to help you, the user, to understand what circs_snake tries to do such that you can use /change this to your liking / environment. For an first rough overview, lets look at a DAG of this pipeline with two input samples.

Alt text

Here you can see that (starting from the top) we have four major "starting points":

  1. the parental pipeline flow (starting with rule r01): does the vote, normalization and preparation steps
  2. find_circ (starting with fc_b, fc_a is a rule unpacking .fastq.gz files if this is the given format)
  3. DCC (starting with dcc_b, dcc_a is a rule unpacking .fastq.gz files if this is the given format)
  4. CIRCexplorer1 (starting with cx_b, cx_a is a rule unpacking .fastq.gz files if this is the given format)

each of the pipelines in run twice here, since we have two input samples in this example. The exception is the parental pipeline, this part will be only run once for each dataset. Another visualization of the same flow is below, making this a little more clear:

Alt text

Here you can see what happens with the data: First all three pipelines (find_circ, DCC, CIRCexplorer1) are run on each sample, resulting in one file for each sample for each pipeline. An example output file at this stage looks like this:

Alt text

These files are summarized in step r06a,b,c that result in a .mat1 file for each pipeline. The columns in this fle are: circRNA coordinates, strand, samplename, detected quantity, quality, quality, refseq annotation Annotation is added, data is summarized and results in a .mat2 file (r07a,b,c). These pipeline-specific matrix2 files are then voted (circRNA coordinates are overlapped and filtered based on only 3/3 overlaps) and finally then normalized, resulting in three normalized and voted circRNA datafiles as the main output of this pipeline. An example output file is given with example_output_norm_voted_dcc_hg19.csv

before you can run this

Before you will be able to run this workflow, you need to have:

  • snakemake installed
  • have the find_circ scripts from the officical website (http://circbase.org/cgi-bin/downloads.cgi, Custom scripts for finding circRNAs; unpack, edit find_circ_conf.yaml accordingly)
  • installed DCC and CIRCexplorer1 (install or download, edit the config.yaml files accordingly)
  • reference genome index built for STAR and Bowtie2, aswell as the reference genome in .fa and .gtf format (other annotation data is in the data/ dir, edit the config.yaml files accordingly)
  • all other software dependencies should be handled by snakemake, see the env.yaml files
  • the config.yaml files are for my specific deployment, yours should vary. Here you only need to change directories for each of the needed files / folders + you can change pipeline-specific parameters to your liking aswell. I attached hg19 and hg38 example config.yaml files to ease your adaption.

and thats it! an example of how to execute the pipeline is given in howtostart.sh, a cluster config example is given in cluster_config.yaml and an example samplesheet is given aswell (samples.tsv)

the samplesheet and expected files

Given this as samples.tsv:

samples
"SRR3184300"
"SRR3184285"

the workflow expects:

SRR3184300_1.fastq and SRR3184300_2.fastq + SRR3184385_1.fastq and SRR3184385_2.fastq in the root directory of this workflow: path/to/circs_snake/. <- put the .fastq files here the lane identifier is changeable in the config.yaml:

lane_ident1:
 "_1"
lane_ident2:
 "_2"

The workflow itself does create the needed .tsv file given two input fastq files in its root directory. you can also self-create this, see scripts/snake_infile_creator.pl. (parental Snakefile, rule r03 is where this would be created from a previously created .fastq file list, rule r02)

how to start a typical circs_snake run:

  • copy/past/move paired end, trimmed and QC'ed .fastq files into circ_snake/.
  • check if the lane idetifier is correct in all config.yaml files (change this if needed)
  • snakemake (for more options here see howtorun.sh)

further reading

For documentation on each single step, please refer to the original pipeline documentation: https://gitlab.com/daaaaande/circs/-/blob/master/README.md

Linting and formatting

Linting results

  1Lints for snakefile /tmp/tmpojt17qj5/Snakefile:
  2    * Absolute path "/cx_out/"+config[" in line 12:
  3      Do not define absolute paths inside of the workflow, since this renders
  4      your workflow irreproducible on other machines. Use path relative to the
  5      working directory instead, or make the path configurable via a config
  6      file.
  7      Also see:
  8      https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
  9    * Absolute path "/"+" in line 12:
 10      Do not define absolute paths inside of the workflow, since this renders
 11      your workflow irreproducible on other machines. Use path relative to the
 12      working directory instead, or make the path configurable via a config
 13      file.
 14      Also see:
 15      https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
 16    * Absolute path "/dc_out/"+config[" in line 13:
 17      Do not define absolute paths inside of the workflow, since this renders
 18      your workflow irreproducible on other machines. Use path relative to the
 19      working directory instead, or make the path configurable via a config
 20      file.
 21      Also see:
 22      https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
 23    * Absolute path "/"+" in line 13:
 24      Do not define absolute paths inside of the workflow, since this renders
 25      your workflow irreproducible on other machines. Use path relative to the
 26      working directory instead, or make the path configurable via a config
 27      file.
 28      Also see:
 29      https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
 30    * Absolute path "/fc_out/"+config[" in line 14:
 31      Do not define absolute paths inside of the workflow, since this renders
 32      your workflow irreproducible on other machines. Use path relative to the
 33      working directory instead, or make the path configurable via a config
 34      file.
 35      Also see:
 36      https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
 37    * Absolute path "/"+" in line 14:
 38      Do not define absolute paths inside of the workflow, since this renders
 39      your workflow irreproducible on other machines. Use path relative to the
 40      working directory instead, or make the path configurable via a config
 41      file.
 42      Also see:
 43      https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
 44    * Absolute path "/"+config[" in line 20:
 45      Do not define absolute paths inside of the workflow, since this renders
 46      your workflow irreproducible on other machines. Use path relative to the
 47      working directory instead, or make the path configurable via a config
 48      file.
 49      Also see:
 50      https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
 51    * Absolute path "/"+config[" in line 21:
 52      Do not define absolute paths inside of the workflow, since this renders
 53      your workflow irreproducible on other machines. Use path relative to the
 54      working directory instead, or make the path configurable via a config
 55      file.
 56      Also see:
 57      https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
 58    * Absolute path "/"+config[" in line 22:
 59      Do not define absolute paths inside of the workflow, since this renders
 60      your workflow irreproducible on other machines. Use path relative to the
 61      working directory instead, or make the path configurable via a config
 62      file.
 63      Also see:
 64      https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
 65    * Absolute path "/reads_per_sample_" in line 25:
 66      Do not define absolute paths inside of the workflow, since this renders
 67      your workflow irreproducible on other machines. Use path relative to the
 68      working directory instead, or make the path configurable via a config
 69      file.
 70      Also see:
 71      https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
 72    * Absolute path "/cx_out/"+config[" in line 29:
 73      Do not define absolute paths inside of the workflow, since this renders
 74      your workflow irreproducible on other machines. Use path relative to the
 75      working directory instead, or make the path configurable via a config
 76      file.
 77      Also see:
 78      https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
 79    * Absolute path "/"+" in line 29:
 80      Do not define absolute paths inside of the workflow, since this renders
 81      your workflow irreproducible on other machines. Use path relative to the
 82      working directory instead, or make the path configurable via a config
 83      file.
 84      Also see:
 85      https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
 86    * Absolute path "/dc_out/"+config[" in line 30:
 87      Do not define absolute paths inside of the workflow, since this renders
 88      your workflow irreproducible on other machines. Use path relative to the
 89      working directory instead, or make the path configurable via a config
 90      file.
 91      Also see:
 92      https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
 93    * Absolute path "/"+" in line 30:
 94      Do not define absolute paths inside of the workflow, since this renders
 95      your workflow irreproducible on other machines. Use path relative to the
 96      working directory instead, or make the path configurable via a config
 97      file.
 98      Also see:
 99      https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
100    * Absolute path "/fc_out/"+config[" in line 31:
101      Do not define absolute paths inside of the workflow, since this renders
102      your workflow irreproducible on other machines. Use path relative to the
103      working directory instead, or make the path configurable via a config
104      file.
105      Also see:
106      https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
107    * Absolute path "/"+" in line 31:
108      Do not define absolute paths inside of the workflow, since this renders
109      your workflow irreproducible on other machines. Use path relative to the
110      working directory instead, or make the path configurable via a config
111      file.
112      Also see:
113      https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
114    * Absolute path "/cx_out/"+config[" in line 39:
115      Do not define absolute paths inside of the workflow, since this renders
116      your workflow irreproducible on other machines. Use path relative to the
117      working directory instead, or make the path configurable via a config
118      file.
119      Also see:
120      https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
121    * Absolute path "/all_" in line 39:
122      Do not define absolute paths inside of the workflow, since this renders
123      your workflow irreproducible on other machines. Use path relative to the
124      working directory instead, or make the path configurable via a config
125      file.
126      Also see:
127      https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
128    * Absolute path "/dc_out/"+config[" in line 40:
129      Do not define absolute paths inside of the workflow, since this renders
130      your workflow irreproducible on other machines. Use path relative to the
131      working directory instead, or make the path configurable via a config
132      file.
133      Also see:
134      https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
135    * Absolute path "/all_" in line 40:
136      Do not define absolute paths inside of the workflow, since this renders
137      your workflow irreproducible on other machines. Use path relative to the
138      working directory instead, or make the path configurable via a config
139      file.
140      Also see:
141      https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
142    * Absolute path "/fc_out/"+config[" in line 41:
143      Do not define absolute paths inside of the workflow, since this renders
144      your workflow irreproducible on other machines. Use path relative to the
145      working directory instead, or make the path configurable via a config
146      file.
147      Also see:
148      https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
149    * Absolute path "/all_" in line 41:
150      Do not define absolute paths inside of the workflow, since this renders
151      your workflow irreproducible on other machines. Use path relative to the
152      working directory instead, or make the path configurable via a config
153      file.
154      Also see:
155      https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
156    * Absolute path "/"+config[" in line 49:
157      Do not define absolute paths inside of the workflow, since this renders
158      your workflow irreproducible on other machines. Use path relative to the
159      working directory instead, or make the path configurable via a config
160      file.
161      Also see:
162      https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
163    * Absolute path "/"+config[" in line 50:
164      Do not define absolute paths inside of the workflow, since this renders
165      your workflow irreproducible on other machines. Use path relative to the
166      working directory instead, or make the path configurable via a config
167      file.
168      Also see:
169      https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
170    * Absolute path "/"+config[" in line 51:
171      Do not define absolute paths inside of the workflow, since this renders
172      your workflow irreproducible on other machines. Use path relative to the
173      working directory instead, or make the path configurable via a config
174      file.
175      Also see:
176      https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
177    * Absolute path "/cx_out/"+config[" in line 59:
178      Do not define absolute paths inside of the workflow, since this renders
179      your workflow irreproducible on other machines. Use path relative to the
180      working directory instead, or make the path configurable via a config
181      file.
182      Also see:
183      https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
184    * Absolute path "/all_" in line 59:
185      Do not define absolute paths inside of the workflow, since this renders
186      your workflow irreproducible on other machines. Use path relative to the
187      working directory instead, or make the path configurable via a config
188      file.
189      Also see:
190      https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
191    * Absolute path "/cx_out/"+config[" in line 70:
192      Do not define absolute paths inside of the workflow, since this renders
193      your workflow irreproducible on other machines. Use path relative to the
194      working directory instead, or make the path configurable via a config
195      file.
196      Also see:
197      https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
198    * Absolute path "/all_" in line 70:
199      Do not define absolute paths inside of the workflow, since this renders
200      your workflow irreproducible on other machines. Use path relative to the
201
202... (truncated)

Formatting results

1[INFO] 1 file(s) would be changed 😬
2
3snakefmt version: 0.4.3