snakemake-workflows/species-quantification
None
Overview
Topics:
Latest release: v1.0.0, Last update: 2021-12-20
Linting: linting: passed, Formatting:formatting: passed
Deployment
Step 1: Install Snakemake and Snakedeploy
Snakemake and Snakedeploy are best installed via the Mamba package manager (a drop-in replacement for conda). If you have neither Conda nor Mamba, it is recommended to install Miniforge. More details regarding Mamba can be found here.
When using Mamba, run
mamba create -c conda-forge -c bioconda --name snakemake snakemake snakedeploy
to install both Snakemake and Snakedeploy in an isolated environment. For all following commands ensure that this environment is activated via
conda activate snakemake
Step 2: Deploy workflow
With Snakemake and Snakedeploy installed, the workflow can be deployed as follows. First, create an appropriate project working directory on your system and enter it:
mkdir -p path/to/project-workdir
cd path/to/project-workdir
In all following steps, we will assume that you are inside of that directory. Then run
snakedeploy deploy-workflow https://github.com/snakemake-workflows/species-quantification . --tag v1.0.0
Snakedeploy will create two folders, workflow
and config
. The former contains the deployment of the chosen workflow as a Snakemake module, the latter contains configuration files which will be modified in the next step in order to configure the workflow to your needs.
Step 3: Configure workflow
To configure the workflow, adapt config/config.yml
to your needs following the instructions below.
Step 4: Run workflow
The deployment method is controlled using the --software-deployment-method
(short --sdm
) argument.
To run the workflow with automatic deployment of all required software via conda
/mamba
, use
snakemake --cores all --sdm conda
Snakemake will automatically detect the main Snakefile
in the workflow
subfolder and execute the workflow module that has been defined by the deployment in step 2.
For further options such as cluster and cloud execution, see the docs.
Step 5: Generate report
After finalizing your data analysis, you can automatically generate an interactive visual HTML report for inspection of results together with parameters and code inside of the browser using
snakemake --report report.zip
Configuration
The following section is imported from the workflow’s config/README.md
.
General settings
This workflow has to be configured prior to run. It quantifies the given fastq samples for the presence of species (e.g. bacteria, virus, archaea), using kraken2 and bracken in combination. One use case is that it can be used to analyse the microbiome of a tumor sample.
Optionally, it can also be used to do benchmarking of mixture samples that are generated within the workflow (for Illumina short reads and ONT (Oxford Nanopore Technologies) long reads at the same time). It starts with simulating short and long reads of desired bacterial species and mimics a biological environment that may consist of these species in a human sample. It results in the generation of scatter plots for the quantification of the presence of species in short and long read mixture samples, respectively. benchmarking
should be set to True
to perform benchmarking.
Sample sheet - Abundance quantification
This sheet has to be defined if the purpose is only to do abundance quantification of samples.
Samples should be added to config/samples.tsv
. All the columns sample_name
, fq1
and fq2
should be defined.
Two things should be carried out depending on the samples.
If samples are paired-end:
-
fq1
andfq2
should be defined accordingly. -
paired
should be set toTrue
in theconfig/config.yaml
.
If samples are not paired-end:
- Only
fq1
column should be defined with the single-end fastq file. -
paired
should be set toFalse
in theconfig/config.yaml
.
Bacteria sheet - Benchmarking (Optional)
Although this functionality of the workflow is optional, if desired, the below configurations should be made prior to run.
The bacterial sheet has to be defined if the purpose is to do benchmarking of short reads (Illumina) and long reads (ONT).
After making sure that benchmarking
is set to True
, bacteria should be added to config/bacteria.yaml
. All the columns should be defined.
- For
bacteria
column, any name to define bacteria can be selected (only important point is that it should not contain whitespaces.) - For
fasta
column, the relative path belonging to reference fasta sequence of the bacterial species that are desired to be present in the mixture samples should be defined. - For
bacterium_name
column, exact names of the bacterial species should be defined.
Additional settings for the benchmarking
The following configurations can be made in the config/config.yaml
.
-
number_of_samples
,p
should be defined to have the desired number of mixture samples in the end and to select for which fractions (by number of reads) of bacterial species to add to mixture samples, respectively. -
short_read_len
andn_reads_per_seq
can be configured for Art simulator. -
long_nreads
can be configured for Nanosim simulator.
Linting and formatting
Linting results
None
Formatting results
None