AToL-Bioinformatics/genome-launcher-workflow

None

Overview

Latest release: None, Last update: 2026-05-01

Share link: https://snakemake.github.io/snakemake-workflow-catalog?wf=AToL-Bioinformatics/genome-launcher-workflow

Quality control: linting: failed formatting: passed

Deployment

Step 1: Install Snakemake and Snakedeploy

Snakemake and Snakedeploy are best installed via the Conda package manager. It is recommended to install conda via Miniforge. Run

conda create -c conda-forge -c bioconda -c nodefaults --name snakemake snakemake snakedeploy

to install both Snakemake and Snakedeploy in an isolated environment. For all following commands ensure that this environment is activated via

conda activate snakemake

For other installation methods, refer to the Snakemake and Snakedeploy documentation.

Step 2: Deploy workflow

With Snakemake and Snakedeploy installed, the workflow can be deployed as follows. First, create an appropriate project working directory on your system and enter it:

mkdir -p path/to/project-workdir
cd path/to/project-workdir

In all following steps, we will assume that you are inside of that directory. Then run

snakedeploy deploy-workflow https://github.com/AToL-Bioinformatics/genome-launcher-workflow . --tag None

Snakedeploy will create two folders, workflow and config. The former contains the deployment of the chosen workflow as a Snakemake module, the latter contains configuration files which will be modified in the next step in order to configure the workflow to your needs.

Step 3: Configure workflow

To configure the workflow, adapt config/config.yml to your needs following the instructions below.

Step 4: Run workflow

The deployment method is controlled using the --software-deployment-method (short --sdm) argument.

To run the workflow using apptainer/singularity, use

snakemake --cores all --sdm apptainer

Snakemake will automatically detect the main Snakefile in the workflow subfolder and execute the workflow module that has been defined by the deployment in step 2.

For further options such as cluster and cloud execution, see the docs.

Step 5: Generate report

After finalizing your data analysis, you can automatically generate an interactive visual HTML report for inspection of results together with parameters and code inside of the browser using

snakemake --report report.zip

Configuration

The following section is imported from the workflow’s config/README.md.

{{ dataset_id }}.{{ assembly_version }}

This repository contains the workflows that were used to assemble the genome {{ dataset_id }}.{{ assembly_version}} for {{ scientific_name }}.

The repo was produced automatically from boilerplate code at AToL-Bioinformatics/genome-launcher-workflow.

Overview

The assembly process has three main steps:

  1. Assembly with sanger-tol/genomeassembly

  2. Decontamination with sanger-tol/ascc

  3. Preparation of curation materials with sanger-tol/treeval, if there are Hi-C reads.

The config files for these steps are in the config directory.

The sanger-tol workflows are plumbed together by the included Snakemake worfklow.

Running the assembly

Setting up

Clone this repo to the HPC where it will be run.

A profile will be needed to configure the job scheduler on the HPC. Profiles for Setonix and Spartan (partial) are included. A profile for local testing is also included.

More information about the profile

The profile needs at least the following files:

Steps

  1. Download the reads and run the QC scripts using the genome-launcher-workflow target pre_genomeassembly.

  2. Run the genomeassembly workflow. See the example 20_genomeassembly.sh script.

  3. Stage the ASCC reference data using the genome-launcher-workflow target post_genomeassembly.

  4. Run the ascc workflow. See the example 30_ascc.sh script.

  5. Convert the ASCC output for TreeVal using the genome-launcher-workflow target post_ascc.

  6. If Hi-C is available, run the treeval workflow. See the example 40_treeval.sh submission script.

  7. Run the post_treeval target to upload the results to object storage. The post_* targets all upload the output of the preceding pipeline to object storage.

Worked example

Run this assembly on Setonix:

[!IMPORTANT]

The pull command requires a Personal Access Token with read access to code and metadata.

  1. Pull the repo:

    1. git init .

    2. git remote add origin https://github.com/AToL-Bioinformatics/{{ dataset_id }}.{{ assembly_version }}.git

    3. git pull origin main

  2. Set up the directory structure: bash profiles/pawsey/00_preflight.sh

  3. Run the workflow steps:

    1. sbatch profiles/pawsey/10_pre_genomeassembly.sh

    2. sbatch profiles/pawsey/20_genomeassembly.sh

    3. sbatch profiles/pawsey/25_post_genomeassembly.sh

    4. sbatch profiles/pawsey/30_ascc.sh

    5. sbatch profiles/pawsey/35_post_ascc.sh

    6. sbatch profiles/pawsey/40_treeval.sh

    7. sbatch profiles/pawsey/45_post_treeval.sh

Linting and formatting

Linting results
1ModuleNotFoundError in file "/tmp/tmp5o38dskx/workflow/Snakefile", line 5:
2No module named 'yaml_manifest'
3  File "/tmp/tmp5o38dskx/workflow/Snakefile", line 5, in <module>
Formatting results
All tests passed!