NCI-CGR/TriosCompass_v2

Trios analysis workflow written in Snakemake

Overview

Topics:

Latest release: 1.0.0, Last update: 2025-02-19

Linting: linting: failed, Formatting:formatting: failed

Deployment

Step 1: Install Snakemake and Snakedeploy

Snakemake and Snakedeploy are best installed via the Mamba package manager (a drop-in replacement for conda). If you have neither Conda nor Mamba, it is recommended to install Miniforge. More details regarding Mamba can be found here.

When using Mamba, run

mamba create -c conda-forge -c bioconda --name snakemake snakemake snakedeploy

to install both Snakemake and Snakedeploy in an isolated environment. For all following commands ensure that this environment is activated via

conda activate snakemake

Step 2: Deploy workflow

With Snakemake and Snakedeploy installed, the workflow can be deployed as follows. First, create an appropriate project working directory on your system and enter it:

mkdir -p path/to/project-workdir
cd path/to/project-workdir

In all following steps, we will assume that you are inside of that directory. Then run

snakedeploy deploy-workflow https://github.com/NCI-CGR/TriosCompass_v2 . --tag 1.0.0

Snakedeploy will create two folders, workflow and config. The former contains the deployment of the chosen workflow as a Snakemake module, the latter contains configuration files which will be modified in the next step in order to configure the workflow to your needs.

Step 3: Configure workflow

To configure the workflow, adapt config/config.yml to your needs following the instructions below.

Step 4: Run workflow

The deployment method is controlled using the --software-deployment-method (short --sdm) argument.

To run the workflow using a combination of conda and apptainer/singularity for software deployment, use

snakemake --cores all --sdm conda apptainer

Snakemake will automatically detect the main Snakefile in the workflow subfolder and execute the workflow module that has been defined by the deployment in step 2.

For further options such as cluster and cloud execution, see the docs.

Step 5: Generate report

After finalizing your data analysis, you can automatically generate an interactive visual HTML report for inspection of results together with parameters and code inside of the browser using

snakemake --report report.zip

Configuration

The following section is imported from the workflow’s config/README.md.

TriosCompass expects to use the parent folder of TriosCompass_v2 (the repo clone folder) as the working space, so as to separate the Snakemake workflow from the working space.

Three configure files are required:

  1. Profile yaml file: workflow/profiles/<PROFILE_NAME>/config.yaml
  2. Config yaml file
  3. Sample yaml file

The config yaml file can be specified by Snakemake command-line argument "--configfile" or in the "profile yaml file, for example:

configfile: TriosCompass_v2/config/fullbam_config.yaml
snakefile: TriosCompass_v2/workflow/Snakefile

In turn, the sample yaml file is specified in the config yaml file, to define sample input via PEPs

pepfile: "config/fullbam_pep.yaml"
pepschema: "../schemas/bam_schema.yaml"
  • Example of the PEP configure file for FASTQ input files
    • config/fastq_pep.yaml

      pep_version: 2.0.0
      sample_table: sample_fastq.csv
      

      # In manifest file, Sample_ID + Flowcell should be unique sample_modifiers: append: sample_name: sn derive: attributes: [sample_name] sources: sn: {SAMPLE_ID}_{FLOWCELL}

    • config/sample_fastq.csv

      SAMPLE_ID,FLOWCELL,LANE,INDEX,R1,R2
      HG002,BH2JWTDSX5,1,CGGTTGTT-GTGGTATG,data/fq/HG002_NA24385_son_80X_R1.fq.gz,data/fq/HG002_NA24385_son_80X_R2.fq.gz
      HG003,BH2JWTDSX5,1,GCGTCATT-CAGACGTT,data/fq/HG003_NA24149_father_80X_R1.fq.gz,data/fq/HG003_NA24149_father_80X_R2.fq.gz
      HG004,BH2JWTDSX5,1,CTGTTGAC-ACCTCAGT,data/fq/HG004_NA24143_mother_80X_R1.fq.gz,data/fq/HG004_NA24143_mother_80X_R2.fq.gz
      
    • workflow/schemas/fastq_schema.yaml (schemas to validate config/fastq_pep.yaml)

      workflow/schemas/fastq_schema.yaml 
      description: A example schema for a pipeline.
      imports:
        - http://schema.databio.org/pep/2.0.0.yaml
        # - TriosCompass_v2/workflow/schemas/2.0.0.yaml
      

      properties: samples: type: array items: type: object properties: SAMPLE_ID: type: string description: sample id FLOWCELL: type: string description: Flowcell INDEX: type: string description: Library index LANE: type: string description: Lane number in flowcell enum: [“1”, “2”] R1: type: string description: path to the R1 fastq file R2: type: string description: path to the R2 fastq file required: - FLOWCELL - SAMPLE_ID - INDEX - R1 - R2

  • Example of the PEP configure file for BAM input files
    • config/bam_pep.yaml

      pep_version: 2.0.0
      sample_table: sample_bam.csv
      

      sample_modifiers: append: sample_name: sn derive: attributes: [sample_name] sources: sn: {SAMPLE_ID}

    • config/sample_bam.csv

      SAMPLE_ID,BAM
      HG002,sorted_bam/HG002_NA24385_son_80X.bam
      HG003,sorted_bam/HG003_NA24149_father_80X.bam
      HG004,sorted_bam/HG004_NA24143_mother_80X.bam
      
    • workflow/schemas/fastq_schema.yaml (schemas to validate config/bam_pep.yaml)

      description: A example schema for a pipeline.
      imports:
        - http://schema.databio.org/pep/2.0.0.yaml
        # - TriosCompass_v2/workflow/schemas/2.0.0.yaml
      

      properties: samples: type: array items: type: object properties: SAMPLE_ID: type: string description: sample id BAM: type: string description: path to the bam file required: - SAMPLE_ID - BAM

Linting and formatting

Linting results

ModuleNotFoundError in file /tmp/tmpt1x93l58/NCI-CGR-TriosCompass_v2-abb9b4f/workflow/rules/pedigree.smk, line 1:
No module named 'peds'
  File "/tmp/tmpt1x93l58/NCI-CGR-TriosCompass_v2-abb9b4f/workflow/rules/pedigree.smk", line 1, in <module>

Formatting results

[DEBUG] 
[DEBUG] In file "/tmp/tmpt1x93l58/NCI-CGR-TriosCompass_v2-abb9b4f/workflow/rules/premap.smk":  Formatted content is different from original
[DEBUG] 
[DEBUG] In file "/tmp/tmpt1x93l58/NCI-CGR-TriosCompass_v2-abb9b4f/workflow/rules/deepvariant.smk":  Formatted content is different from original
[DEBUG] 
[DEBUG] In file "/tmp/tmpt1x93l58/NCI-CGR-TriosCompass_v2-abb9b4f/workflow/rules/bam_qc.smk":  Formatted content is different from original
[DEBUG] 
<unknown>:1: SyntaxWarning: invalid escape sequence '\('
[DEBUG] In file "/tmp/tmpt1x93l58/NCI-CGR-TriosCompass_v2-abb9b4f/workflow/rules/dnSTR.smk":  Formatted content is different from original
[DEBUG] 
[DEBUG] In file "/tmp/tmpt1x93l58/NCI-CGR-TriosCompass_v2-abb9b4f/workflow/rules/ref.smk":  Formatted content is different from original
[DEBUG] 
[ERROR] In file "/tmp/tmpt1x93l58/NCI-CGR-TriosCompass_v2-abb9b4f/workflow/rules/gatk_hc.smk":  NoParametersError: L82: In resources definition.
[DEBUG] In file "/tmp/tmpt1x93l58/NCI-CGR-TriosCompass_v2-abb9b4f/workflow/rules/gatk_hc.smk":  
[ERROR] In file "/tmp/tmpt1x93l58/NCI-CGR-TriosCompass_v2-abb9b4f/workflow/rules/bam_input.smk":  InvalidPython: Black error:

Cannot parse for target version Python 3.12: 1:0: else:

(Note reported line number may be incorrect, as snakefmt could not determine the true line number)


... (truncated)