nezapajek/project-tobamo

Preparation of a curated catalogue of sequences of possible new tobamoviruses by scanning a large accumulated set of data from different metagenomics data repositories.

Overview

Latest release: None, Last update: 2025-09-26

Linting: linting: failed, Formatting: formatting: passed

Deployment

Step 1: Install Snakemake and Snakedeploy

Snakemake and Snakedeploy are best installed via the Conda. It is recommended to install conda via Miniforge. Run

conda create -c conda-forge -c bioconda -c nodefaults --name snakemake snakemake snakedeploy

to install both Snakemake and Snakedeploy in an isolated environment. For all following commands ensure that this environment is activated via

conda activate snakemake

For other installation methods, refer to the Snakemake and Snakedeploy documentation.

Step 2: Deploy workflow

With Snakemake and Snakedeploy installed, the workflow can be deployed as follows. First, create an appropriate project working directory on your system and enter it:

mkdir -p path/to/project-workdir
cd path/to/project-workdir

In all following steps, we will assume that you are inside of that directory. Then run

snakedeploy deploy-workflow https://github.com/nezapajek/project-tobamo . --tag None

Snakedeploy will create two folders, workflow and config. The former contains the deployment of the chosen workflow as a Snakemake module, the latter contains configuration files which will be modified in the next step in order to configure the workflow to your needs.

Step 3: Configure workflow

To configure the workflow, adapt config/config.yml to your needs following the instructions below.

Step 4: Run workflow

The deployment method is controlled using the --software-deployment-method (short --sdm) argument.

To run the workflow with automatic deployment of all required software via conda/mamba, use

snakemake --cores all --sdm conda

Snakemake will automatically detect the main Snakefile in the workflow subfolder and execute the workflow module that has been defined by the deployment in step 2.

For further options such as cluster and cloud execution, see the docs.

Step 5: Generate report

After finalizing your data analysis, you can automatically generate an interactive visual HTML report for inspection of results together with parameters and code inside of the browser using

snakemake --report report.zip

Configuration

The following section is imported from the workflow’s config/README.md.

Configuration Guide

This directory contains configuration files for the tobamo virus detection workflow.

Configuration Files

config.yaml

Main configuration file that specifies:

  • Sample file location (samples: config/samples_all.tsv)

  • Additional workflow parameters

samples: config/samples_all.tsv  # Path to sample list
# Add other configuration parameters as needed

Sample Files

Different sample files are provided for various use cases:

File

Description

Samples

Use Case

samples_debug.tsv

Debug dataset

2

Troubleshooting

samples_12.tsv

Small dataset

12

Development and debugging

samples_test.tsv

Test dataset

253

Test samples

samples_all.tsv

Complete dataset

279

Test and control samples

Sample File Format

Sample files are tab-separated with a single column header:

samples
SRR1234567
ERR2345678
DRR3456789

Requirements:

  • First line must be samples (header)

  • One SRA accession per line

  • Supported prefixes: SRR, ERR, DRR

  • No empty lines or comments

Usage Examples

Basic Configuration

  1. Choose appropriate sample file:

    # For testing
    cp config/samples_test.tsv config/my_samples.tsv
    
    # For production
    cp config/samples_all.tsv config/my_samples.tsv
    
  2. Edit config.yaml:

    samples: config/my_samples.tsv
    

Custom Sample List

  1. Create custom sample file:

    echo "samples" > config/custom_samples.tsv
    echo "SRR1234567" >> config/custom_samples.tsv
    echo "ERR2345678" >> config/custom_samples.tsv
    
  2. Update configuration:

    samples: config/custom_samples.tsv
    

Validation

Before running the workflow, validate your configuration:

# Check sample file format
snakemake -n --configfile config/config.yaml

# Validate specific samples exist in SRA
snakemake --use-conda -n -R download_sra

Advanced Configuration

For advanced users, additional parameters can be added to config.yaml:

samples: config/samples_all.tsv

# Example additional parameters
assembly:
  megahit_memory: 0.9  # Memory fraction for MEGAHIT
  spades_memory: 500   # Memory limit in GB for SPAdes

diamond:
  sensitivity: "ultra-sensitive"  # Diamond sensitivity
  evalue: 1e-5                   # E-value threshold

megan:
  min_score: 50        # Minimum bit score
  max_expected: 0.01   # Maximum expected value

Troubleshooting

Common Configuration Issues

  1. Invalid sample format:

    • Ensure header is exactly samples

    • Check for extra spaces or tabs

    • Verify SRA accession format

  2. File path issues:

    • Use relative paths from project root

    • Ensure sample files exist before running

  3. Memory configuration:

    • Adjust memory settings for your system

    • Monitor resource usage during runs

Linting and formatting

Linting results

 1/tmp/tmpuvzzyrsv/workflow/Snakefile:23: SyntaxWarning: invalid escape sequence '\d'
 2  
 3Lints for rule all (line 27, /tmp/tmpuvzzyrsv/workflow/Snakefile):
 4    * No log directive defined:
 5      Without a log directive, all output will be printed to the terminal. In
 6      distributed environments, this means that errors are harder to discover.
 7      In local environments, output of concurrent jobs will be mixed and become
 8      unreadable.
 9      Also see:
10      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
11    * Specify a conda environment or container for each rule.:
12      This way, the used software for each specific step is documented, and the
13      workflow can be executed on any machine without prerequisites.
14      Also see:
15      https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#integrated-package-management
16      https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#running-jobs-in-containers

Formatting results

All tests passed!