ncherric/Iliad

ILIAD: A suite of automated Snakemake workflows for processing genomic data for downstream applications

Overview

Topics: automatic genetics genomics-data snakemake workflow bioinformatics fastq genomics modularization cram idat vcf

Latest release: v1.0.0, Last update: 2024-01-21

Linting: linting: failed, Formatting:formatting: failed

Deployment

Step 1: Install Snakemake and Snakedeploy

Snakemake and Snakedeploy are best installed via the Mamba package manager (a drop-in replacement for conda). If you have neither Conda nor Mamba, it is recommended to install Miniforge. More details regarding Mamba can be found here.

When using Mamba, run

mamba create -c conda-forge -c bioconda --name snakemake snakemake snakedeploy

to install both Snakemake and Snakedeploy in an isolated environment. For all following commands ensure that this environment is activated via

conda activate snakemake

Step 2: Deploy workflow

With Snakemake and Snakedeploy installed, the workflow can be deployed as follows. First, create an appropriate project working directory on your system and enter it:

mkdir -p path/to/project-workdir
cd path/to/project-workdir

In all following steps, we will assume that you are inside of that directory. Then run

snakedeploy deploy-workflow https://github.com/ncherric/Iliad . --tag v1.0.0

Snakedeploy will create two folders, workflow and config. The former contains the deployment of the chosen workflow as a Snakemake module, the latter contains configuration files which will be modified in the next step in order to configure the workflow to your needs.

Step 3: Configure workflow

To configure the workflow, adapt config/config.yml to your needs following the instructions below.

Step 4: Run workflow

The deployment method is controlled using the --software-deployment-method (short --sdm) argument.

To run the workflow with automatic deployment of all required software via conda/mamba, use

snakemake --cores all --sdm conda

To run the workflow using apptainer/singularity, use

snakemake --cores all --sdm apptainer

To run the workflow using a combination of conda and apptainer/singularity for software deployment, use

snakemake --cores all --sdm conda apptainer

Snakemake will automatically detect the main Snakefile in the workflow subfolder and execute the workflow module that has been defined by the deployment in step 2.

For further options such as cluster and cloud execution, see the docs.

Step 5: Generate report

After finalizing your data analysis, you can automatically generate an interactive visual HTML report for inspection of results together with parameters and code inside of the browser using

snakemake --report report.zip

Configuration

The following section is imported from the workflow’s config/README.md.

TL;DR setup

General Input and Output

Input Output
FASTQ data or FTP links to FASTQ data quality-controlled VCF for each chromosome

Please make sure that your conda environment for Iliad is activated - conda activate iliadEnv or mamba activate iliadEnv

Modify the configuration file workdirPath parameter to the appropriate path leading up to and including /Iliad and a final forward slash e.g. /Path/To/Iliad/. The configuration file is found in config/config.yaml.

    #####################################
    #####################################
    #####################################
<span class="pl-c"><span class="pl-c">#</span>  #  # USER INPUT VARIABLES  #  #  #</span>

<span class="pl-c"><span class="pl-c">#</span>####################################</span>
<span class="pl-c"><span class="pl-c">#</span>####################################</span>
<span class="pl-c"><span class="pl-c">#</span>####################################</span>

<span class="pl-c"><span class="pl-c">#</span> You must insert your /PATH/TO/Iliad/</span>
<span class="pl-c"><span class="pl-c">#</span> use 'pwd' command to find your current working directory when you are inside of Iliad directory</span>
<span class="pl-c"><span class="pl-c">#</span> e.g. /path/to/Iliad/ &lt;---- must include forward slash at the end of working directory path</span>

<span class="pl-c"><span class="pl-c">#</span> must include forward slash, '/', at the end of working directory path</span>
<span class="pl-ent">workdirPath</span>: <span class="pl-s">NEED PATH HERE</span></pre></div>

You might consider changing some other parameters to your project needs that are pre-set and include:

Homo sapiens GRCh38 release 104 reference genome

    ref:
      species: homo_sapiens
      release: 104
      build: GRCh38

Use an Excel sheet or CSV file with no header and the following two columns/fields:

    Sample   Unique sample identifier
    URL   raw sequence data download FTP link

Example: UserSampleTable.xlsx or UserSampleTable.csv are found in the /Iliad/config/ directory

KPGP-00127 ftp://ftp.kobic.re.kr/pub/KPGP/2020_release_candidate/WGS_SR/KPGP-00127/KPGP-00127_L1_R1.fq.gz
KPGP-00127 ftp://ftp.kobic.re.kr/pub/KPGP/2020_release_candidate/WGS_SR/KPGP-00127/KPGP-00127_L1_R2.fq.gz

This exact template exists already in /Iliad/config/UserSampleTable.xlsx OR /Iliad/config/UserSampleTable.csv. (The Excel Viewer extension on VS code is really handy for editing the .xlsx file if spreadsheets are your preference!) If you already have the sequence files and are not downloading open-source data, you have the option to place your data into the Iliad/data/fastq/ directory.

Whether you are automatically downloading via Iliad or you manually place data into Iliad/data/fastq/ directory, you need to provide a separate samples.tsv file where the TSV file has a header line with only one field named sample.

    sample  HEADER
    SAMPLE1 sample identifier
    SAMPLE2 sample identifier

Example: samples.tsv found in the /Iliad/config/ directory

sample
KPGP-00127

since this module is the main snakefile, Snakemake will automatically detect it without the flag. (Please make sure that your conda environment for Iliad is activated - conda activate iliadEnv or mamba activate iliadEnv)

    $ snakemake --cores 1

and combined with other user-specified snakemake flags such as --cores.

If you plan to use on a local machine or self-built server without a job scheduler the default command to run is the following:

   $ snakemake -p --use-singularity --use-conda --cores 1 --jobs 1 --default-resource=mem_mb=10000 --latency-wait 120

However, there is a file included in the Iliad directory named - snakemake.sh that will be useful in batch job submission. Below is an example snakemake workflow submission in SLURM job scheduler. Please read the shell variables at the top of the script and customize to your own paths and resource needs.

   $ sbatch snakemake.sh

If you would like more in-depth information and descriptions, please continue to the next sections below. Otherwise, you have completed the TL;DR setup section.

Linting and formatting

Linting results

Workflow defines that rule get_genome is eligible for caching between workflows (use the --cache argument to enable this).
WorkflowError in rule get_genome in file /tmp/tmp_w1c63pr/ncherric-Iliad-136deec/workflow/rules/ref.smk, line 1:
Rules with a benchmark directive may not be marked as eligible for between-workflow caching at the same time. The reason is that when the result is taken from cache, there is no way to fill the benchmark file with any reasonable values. Either remove the benchmark directive or disable between-workflow caching for this rule. (rule get_genome, line 1, /tmp/tmp_w1c63pr/ncherric-Iliad-136deec/workflow/rules/ref.smk)

Formatting results

[DEBUG] 
[ERROR] In file "/tmp/tmp_w1c63pr/ncherric-Iliad-136deec/workflow/rules/lift_and_merge-2B_38_to_37_VCF-if37VCF-IDs.smk":  NoParametersError: L63: In params definition.
[DEBUG] In file "/tmp/tmp_w1c63pr/ncherric-Iliad-136deec/workflow/rules/lift_and_merge-2B_38_to_37_VCF-if37VCF-IDs.smk":  
[DEBUG] In file "/tmp/tmp_w1c63pr/ncherric-Iliad-136deec/workflow/Snakefile":  Formatted content is different from original
[DEBUG] 
[ERROR] In file "/tmp/tmp_w1c63pr/ncherric-Iliad-136deec/workflow/rules/lift_and_merge-3B_38_to_37_VCF-if37VCF-cleanIDs-Fixref.smk":  NoParametersError: L117: In params definition.
[DEBUG] In file "/tmp/tmp_w1c63pr/ncherric-Iliad-136deec/workflow/rules/lift_and_merge-3B_38_to_37_VCF-if37VCF-cleanIDs-Fixref.smk":  
[DEBUG] In file "/tmp/tmp_w1c63pr/ncherric-Iliad-136deec/workflow/rules/idat2gtc.smk":  Formatted content is different from original
[DEBUG] 
[ERROR] In file "/tmp/tmp_w1c63pr/ncherric-Iliad-136deec/workflow/rules/lift_and_merge-1_prepare_38_to_37_VCFs.smk":  NoParametersError: L11: In params definition.
[DEBUG] In file "/tmp/tmp_w1c63pr/ncherric-Iliad-136deec/workflow/rules/lift_and_merge-1_prepare_38_to_37_VCFs.smk":  
[WARNING] In file "/tmp/tmp_w1c63pr/ncherric-Iliad-136deec/workflow/rules/liftoverTo37.smk":  Keyword "input" at line 28 has comments under a value.
	PEP8 recommends block comments appear before what they describe
(see https://www.python.org/dev/peps/pep-0008/#id30)
[DEBUG] In file "/tmp/tmp_w1c63pr/ncherric-Iliad-136deec/workflow/rules/liftoverTo37.smk":  Formatted content is different from original
[DEBUG] 
[DEBUG] In file "/tmp/tmp_w1c63pr/ncherric-Iliad-136deec/workflow/rules/gtc2vcf.smk":  Formatted content is different from original
[DEBUG] 
[ERROR] In file "/tmp/tmp_w1c63pr/ncherric-Iliad-136deec/workflow/rules/lift_and_merge-4A_38_to_37-Lift-to-37.smk":  NoParametersError: L25: In params definition.
[DEBUG] In file "/tmp/tmp_w1c63pr/ncherric-Iliad-136deec/workflow/rules/lift_and_merge-4A_38_to_37-Lift-to-37.smk":  

... (truncated)