AToL-Bioinformatics/genome-launcher-workflow
None
Overview
Latest release: None, Last update: 2026-05-01
Share link: https://snakemake.github.io/snakemake-workflow-catalog?wf=AToL-Bioinformatics/genome-launcher-workflow
Quality control: linting: failed formatting: passed
Deployment
Step 1: Install Snakemake and Snakedeploy
Snakemake and Snakedeploy are best installed via the Conda package manager. It is recommended to install conda via Miniforge. Run
conda create -c conda-forge -c bioconda -c nodefaults --name snakemake snakemake snakedeploy
to install both Snakemake and Snakedeploy in an isolated environment. For all following commands ensure that this environment is activated via
conda activate snakemake
For other installation methods, refer to the Snakemake and Snakedeploy documentation.
Step 2: Deploy workflow
With Snakemake and Snakedeploy installed, the workflow can be deployed as follows. First, create an appropriate project working directory on your system and enter it:
mkdir -p path/to/project-workdir
cd path/to/project-workdir
In all following steps, we will assume that you are inside of that directory. Then run
snakedeploy deploy-workflow https://github.com/AToL-Bioinformatics/genome-launcher-workflow . --tag None
Snakedeploy will create two folders, workflow and config. The former contains the deployment of the chosen workflow as a Snakemake module, the latter contains configuration files which will be modified in the next step in order to configure the workflow to your needs.
Step 3: Configure workflow
To configure the workflow, adapt config/config.yml to your needs following the instructions below.
Step 4: Run workflow
The deployment method is controlled using the --software-deployment-method (short --sdm) argument.
To run the workflow using apptainer/singularity, use
snakemake --cores all --sdm apptainer
Snakemake will automatically detect the main Snakefile in the workflow subfolder and execute the workflow module that has been defined by the deployment in step 2.
For further options such as cluster and cloud execution, see the docs.
Step 5: Generate report
After finalizing your data analysis, you can automatically generate an interactive visual HTML report for inspection of results together with parameters and code inside of the browser using
snakemake --report report.zip
Configuration
The following section is imported from the workflow’s config/README.md.
{{ dataset_id }}.{{ assembly_version }}
This repository contains the workflows that were used to assemble the genome {{ dataset_id }}.{{ assembly_version}} for {{ scientific_name }}.
The repo was produced automatically from boilerplate code at AToL-Bioinformatics/genome-launcher-workflow.
Overview
The assembly process has three main steps:
Assembly with sanger-tol/genomeassembly
Decontamination with sanger-tol/ascc
Preparation of curation materials with sanger-tol/treeval, if there are Hi-C reads.
The config files for these steps are in the config directory.
The sanger-tol workflows are plumbed together by the included Snakemake worfklow.
Running the assembly
Setting up
Clone this repo to the HPC where it will be run.
A profile will be needed to configure the job scheduler on the HPC. Profiles for Setonix and Spartan (partial) are included. A profile for local testing is also included.
More information about the profile
The profile needs at least the following files:
Snakemake job config and workflow config: configure the jobs from the genome-launcher-workflow.
nextflow config: configure the processes from the Sanger-Tol pipelines
ascc.params.config: the YAML params file for ASCC (not shared with the other pipelines).
Steps
Download the reads and run the QC scripts using the genome-launcher-workflow target
pre_genomeassembly.Run the
genomeassemblyworkflow. See the example 20_genomeassembly.sh script.Stage the ASCC reference data using the genome-launcher-workflow target
post_genomeassembly.Run the
asccworkflow. See the example 30_ascc.sh script.Convert the ASCC output for TreeVal using the genome-launcher-workflow target
post_ascc.If Hi-C is available, run the
treevalworkflow. See the example 40_treeval.sh submission script.Run the
post_treevaltarget to upload the results to object storage. Thepost_*targets all upload the output of the preceding pipeline to object storage.
Worked example
Run this assembly on Setonix:
[!IMPORTANT]
The pull command requires a Personal Access Token with read access to code and metadata.
Pull the repo:
git init .git remote add origin https://github.com/AToL-Bioinformatics/{{ dataset_id }}.{{ assembly_version }}.gitgit pull origin main
Set up the directory structure:
bash profiles/pawsey/00_preflight.shRun the workflow steps:
sbatch profiles/pawsey/10_pre_genomeassembly.shsbatch profiles/pawsey/20_genomeassembly.shsbatch profiles/pawsey/25_post_genomeassembly.shsbatch profiles/pawsey/30_ascc.shsbatch profiles/pawsey/35_post_ascc.shsbatch profiles/pawsey/40_treeval.shsbatch profiles/pawsey/45_post_treeval.sh
Linting and formatting
Linting results
1ModuleNotFoundError in file "/tmp/tmp5o38dskx/workflow/Snakefile", line 5:
2No module named 'yaml_manifest'
3 File "/tmp/tmp5o38dskx/workflow/Snakefile", line 5, in <module>
Formatting results
All tests passed!