WestGermanGenomeCenter/circrna_detection
circs_snake : a snakemake-based circRNA detection workflow
Overview
Topics: circular rna rna-seq rna-seq-pipeline rnaseq-pipeline
Latest release: None, Last update: 2021-09-30
Linting: linting: failed, Formatting: formatting: failed
Deployment
Step 1: Install Snakemake and Snakedeploy
Snakemake and Snakedeploy are best installed via the Mamba package manager (a drop-in replacement for conda). If you have neither Conda nor Mamba, it is recommended to install Miniforge. More details regarding Mamba can be found here.
When using Mamba, run
mamba create -c conda-forge -c bioconda --name snakemake snakemake snakedeploy
to install both Snakemake and Snakedeploy in an isolated environment. For all following commands ensure that this environment is activated via
conda activate snakemake
Step 2: Deploy workflow
With Snakemake and Snakedeploy installed, the workflow can be deployed as follows. First, create an appropriate project working directory on your system and enter it:
mkdir -p path/to/project-workdir
cd path/to/project-workdir
In all following steps, we will assume that you are inside of that directory. Then run
snakedeploy deploy-workflow https://github.com/WestGermanGenomeCenter/circrna_detection . --tag None
Snakedeploy will create two folders, workflow
and config
. The former contains the deployment of the chosen workflow as a Snakemake module, the latter contains configuration files which will be modified in the next step in order to configure the workflow to your needs.
Step 3: Configure workflow
To configure the workflow, adapt config/config.yml
to your needs following the instructions below.
Step 4: Run workflow
The deployment method is controlled using the --software-deployment-method
(short --sdm
) argument.
To run the workflow with automatic deployment of all required software via conda
/mamba
, use
snakemake --cores all --sdm conda
To run the workflow using apptainer
/singularity
, use
snakemake --cores all --sdm apptainer
To run the workflow using a combination of conda
and apptainer
/singularity
for software deployment, use
snakemake --cores all --sdm conda apptainer
Snakemake will automatically detect the main Snakefile
in the workflow
subfolder and execute the workflow module that has been defined by the deployment in step 2.
For further options such as cluster and cloud execution, see the docs.
Step 5: Generate report
After finalizing your data analysis, you can automatically generate an interactive visual HTML report for inspection of results together with parameters and code inside of the browser using
snakemake --report report.zip
Configuration
The following section is imported from the workflow’s config/README.md
.
users manual to circs_snake
circs_snake is a multi-pipeline circRNA detection workflow from RNASeq data.
This readme is meant to help you, the user, to understand what circs_snake tries to do such that you can use /change this to your liking / environment. For an first rough overview, lets look at a DAG of this pipeline with two input samples.
Here you can see that (starting from the top) we have four major "starting points":
- the parental pipeline flow (starting with rule r01): does the vote, normalization and preparation steps
- find_circ (starting with fc_b, fc_a is a rule unpacking .fastq.gz files if this is the given format)
- DCC (starting with dcc_b, dcc_a is a rule unpacking .fastq.gz files if this is the given format)
- CIRCexplorer1 (starting with cx_b, cx_a is a rule unpacking .fastq.gz files if this is the given format)
each of the pipelines in run twice here, since we have two input samples in this example. The exception is the parental pipeline, this part will be only run once for each dataset. Another visualization of the same flow is below, making this a little more clear:
Here you can see what happens with the data: First all three pipelines (find_circ, DCC, CIRCexplorer1) are run on each sample, resulting in one file for each sample for each pipeline. An example output file at this stage looks like this:
These files are summarized in step r06a,b,c that result in a .mat1 file for each pipeline. The columns in this fle are: circRNA coordinates, strand, samplename, detected quantity, quality, quality, refseq annotation Annotation is added, data is summarized and results in a .mat2 file (r07a,b,c). These pipeline-specific matrix2 files are then voted (circRNA coordinates are overlapped and filtered based on only 3/3 overlaps) and finally then normalized, resulting in three normalized and voted circRNA datafiles as the main output of this pipeline. An example output file is given with example_output_norm_voted_dcc_hg19.csv
before you can run this
Before you will be able to run this workflow, you need to have:
- snakemake installed
- have the find_circ scripts from the officical website (http://circbase.org/cgi-bin/downloads.cgi, Custom scripts for finding circRNAs; unpack, edit find_circ_conf.yaml accordingly)
- installed DCC and CIRCexplorer1 (install or download, edit the config.yaml files accordingly)
- reference genome index built for STAR and Bowtie2, aswell as the reference genome in .fa and .gtf format (other annotation data is in the data/ dir, edit the config.yaml files accordingly)
- all other software dependencies should be handled by snakemake, see the env.yaml files
- the config.yaml files are for my specific deployment, yours should vary. Here you only need to change directories for each of the needed files / folders + you can change pipeline-specific parameters to your liking aswell. I attached hg19 and hg38 example config.yaml files to ease your adaption.
and thats it! an example of how to execute the pipeline is given in howtostart.sh, a cluster config example is given in cluster_config.yaml and an example samplesheet is given aswell (samples.tsv)
the samplesheet and expected files
Given this as samples.tsv:
samples
"SRR3184300"
"SRR3184285"
the workflow expects:
SRR3184300_1.fastq and SRR3184300_2.fastq + SRR3184385_1.fastq and SRR3184385_2.fastq in the root directory of this workflow: path/to/circs_snake/. <- put the .fastq files here the lane identifier is changeable in the config.yaml:
lane_ident1:
"_1"
lane_ident2:
"_2"
The workflow itself does create the needed .tsv file given two input fastq files in its root directory. you can also self-create this, see scripts/snake_infile_creator.pl. (parental Snakefile, rule r03 is where this would be created from a previously created .fastq file list, rule r02)
how to start a typical circs_snake run:
- copy/past/move paired end, trimmed and QC'ed .fastq files into circ_snake/.
- check if the lane idetifier is correct in all config.yaml files (change this if needed)
-
snakemake
(for more options here see howtorun.sh)
further reading
For documentation on each single step, please refer to the original pipeline documentation: https://gitlab.com/daaaaande/circs/-/blob/master/README.md
Linting and formatting
Linting results
1Lints for snakefile /tmp/tmpojt17qj5/Snakefile:
2 * Absolute path "/cx_out/"+config[" in line 12:
3 Do not define absolute paths inside of the workflow, since this renders
4 your workflow irreproducible on other machines. Use path relative to the
5 working directory instead, or make the path configurable via a config
6 file.
7 Also see:
8 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
9 * Absolute path "/"+" in line 12:
10 Do not define absolute paths inside of the workflow, since this renders
11 your workflow irreproducible on other machines. Use path relative to the
12 working directory instead, or make the path configurable via a config
13 file.
14 Also see:
15 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
16 * Absolute path "/dc_out/"+config[" in line 13:
17 Do not define absolute paths inside of the workflow, since this renders
18 your workflow irreproducible on other machines. Use path relative to the
19 working directory instead, or make the path configurable via a config
20 file.
21 Also see:
22 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
23 * Absolute path "/"+" in line 13:
24 Do not define absolute paths inside of the workflow, since this renders
25 your workflow irreproducible on other machines. Use path relative to the
26 working directory instead, or make the path configurable via a config
27 file.
28 Also see:
29 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
30 * Absolute path "/fc_out/"+config[" in line 14:
31 Do not define absolute paths inside of the workflow, since this renders
32 your workflow irreproducible on other machines. Use path relative to the
33 working directory instead, or make the path configurable via a config
34 file.
35 Also see:
36 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
37 * Absolute path "/"+" in line 14:
38 Do not define absolute paths inside of the workflow, since this renders
39 your workflow irreproducible on other machines. Use path relative to the
40 working directory instead, or make the path configurable via a config
41 file.
42 Also see:
43 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
44 * Absolute path "/"+config[" in line 20:
45 Do not define absolute paths inside of the workflow, since this renders
46 your workflow irreproducible on other machines. Use path relative to the
47 working directory instead, or make the path configurable via a config
48 file.
49 Also see:
50 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
51 * Absolute path "/"+config[" in line 21:
52 Do not define absolute paths inside of the workflow, since this renders
53 your workflow irreproducible on other machines. Use path relative to the
54 working directory instead, or make the path configurable via a config
55 file.
56 Also see:
57 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
58 * Absolute path "/"+config[" in line 22:
59 Do not define absolute paths inside of the workflow, since this renders
60 your workflow irreproducible on other machines. Use path relative to the
61 working directory instead, or make the path configurable via a config
62 file.
63 Also see:
64 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
65 * Absolute path "/reads_per_sample_" in line 25:
66 Do not define absolute paths inside of the workflow, since this renders
67 your workflow irreproducible on other machines. Use path relative to the
68 working directory instead, or make the path configurable via a config
69 file.
70 Also see:
71 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
72 * Absolute path "/cx_out/"+config[" in line 29:
73 Do not define absolute paths inside of the workflow, since this renders
74 your workflow irreproducible on other machines. Use path relative to the
75 working directory instead, or make the path configurable via a config
76 file.
77 Also see:
78 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
79 * Absolute path "/"+" in line 29:
80 Do not define absolute paths inside of the workflow, since this renders
81 your workflow irreproducible on other machines. Use path relative to the
82 working directory instead, or make the path configurable via a config
83 file.
84 Also see:
85 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
86 * Absolute path "/dc_out/"+config[" in line 30:
87 Do not define absolute paths inside of the workflow, since this renders
88 your workflow irreproducible on other machines. Use path relative to the
89 working directory instead, or make the path configurable via a config
90 file.
91 Also see:
92 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
93 * Absolute path "/"+" in line 30:
94 Do not define absolute paths inside of the workflow, since this renders
95 your workflow irreproducible on other machines. Use path relative to the
96 working directory instead, or make the path configurable via a config
97 file.
98 Also see:
99 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
100 * Absolute path "/fc_out/"+config[" in line 31:
101 Do not define absolute paths inside of the workflow, since this renders
102 your workflow irreproducible on other machines. Use path relative to the
103 working directory instead, or make the path configurable via a config
104 file.
105 Also see:
106 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
107 * Absolute path "/"+" in line 31:
108 Do not define absolute paths inside of the workflow, since this renders
109 your workflow irreproducible on other machines. Use path relative to the
110 working directory instead, or make the path configurable via a config
111 file.
112 Also see:
113 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
114 * Absolute path "/cx_out/"+config[" in line 39:
115 Do not define absolute paths inside of the workflow, since this renders
116 your workflow irreproducible on other machines. Use path relative to the
117 working directory instead, or make the path configurable via a config
118 file.
119 Also see:
120 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
121 * Absolute path "/all_" in line 39:
122 Do not define absolute paths inside of the workflow, since this renders
123 your workflow irreproducible on other machines. Use path relative to the
124 working directory instead, or make the path configurable via a config
125 file.
126 Also see:
127 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
128 * Absolute path "/dc_out/"+config[" in line 40:
129 Do not define absolute paths inside of the workflow, since this renders
130 your workflow irreproducible on other machines. Use path relative to the
131 working directory instead, or make the path configurable via a config
132 file.
133 Also see:
134 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
135 * Absolute path "/all_" in line 40:
136 Do not define absolute paths inside of the workflow, since this renders
137 your workflow irreproducible on other machines. Use path relative to the
138 working directory instead, or make the path configurable via a config
139 file.
140 Also see:
141 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
142 * Absolute path "/fc_out/"+config[" in line 41:
143 Do not define absolute paths inside of the workflow, since this renders
144 your workflow irreproducible on other machines. Use path relative to the
145 working directory instead, or make the path configurable via a config
146 file.
147 Also see:
148 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
149 * Absolute path "/all_" in line 41:
150 Do not define absolute paths inside of the workflow, since this renders
151 your workflow irreproducible on other machines. Use path relative to the
152 working directory instead, or make the path configurable via a config
153 file.
154 Also see:
155 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
156 * Absolute path "/"+config[" in line 49:
157 Do not define absolute paths inside of the workflow, since this renders
158 your workflow irreproducible on other machines. Use path relative to the
159 working directory instead, or make the path configurable via a config
160 file.
161 Also see:
162 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
163 * Absolute path "/"+config[" in line 50:
164 Do not define absolute paths inside of the workflow, since this renders
165 your workflow irreproducible on other machines. Use path relative to the
166 working directory instead, or make the path configurable via a config
167 file.
168 Also see:
169 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
170 * Absolute path "/"+config[" in line 51:
171 Do not define absolute paths inside of the workflow, since this renders
172 your workflow irreproducible on other machines. Use path relative to the
173 working directory instead, or make the path configurable via a config
174 file.
175 Also see:
176 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
177 * Absolute path "/cx_out/"+config[" in line 59:
178 Do not define absolute paths inside of the workflow, since this renders
179 your workflow irreproducible on other machines. Use path relative to the
180 working directory instead, or make the path configurable via a config
181 file.
182 Also see:
183 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
184 * Absolute path "/all_" in line 59:
185 Do not define absolute paths inside of the workflow, since this renders
186 your workflow irreproducible on other machines. Use path relative to the
187 working directory instead, or make the path configurable via a config
188 file.
189 Also see:
190 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
191 * Absolute path "/cx_out/"+config[" in line 70:
192 Do not define absolute paths inside of the workflow, since this renders
193 your workflow irreproducible on other machines. Use path relative to the
194 working directory instead, or make the path configurable via a config
195 file.
196 Also see:
197 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
198 * Absolute path "/all_" in line 70:
199 Do not define absolute paths inside of the workflow, since this renders
200 your workflow irreproducible on other machines. Use path relative to the
201
202... (truncated)
Formatting results
1[INFO] 1 file(s) would be changed 😬
2
3snakefmt version: 0.4.3