nezapajek/project-tobamo

Preparation of a curated catalogue of sequences of possible new tobamoviruses by scanning a large accumulated set of data from different metagenomics data repositories.

Overview

Latest release: None, Last update: 2025-11-28

Linting: linting: failed, Formatting: formatting: passed

Deployment

Step 1: Install Snakemake and Snakedeploy

Snakemake and Snakedeploy are best installed via the Conda. It is recommended to install conda via Miniforge. Run

conda create -c conda-forge -c bioconda -c nodefaults --name snakemake snakemake snakedeploy

to install both Snakemake and Snakedeploy in an isolated environment. For all following commands ensure that this environment is activated via

conda activate snakemake

For other installation methods, refer to the Snakemake and Snakedeploy documentation.

Step 2: Deploy workflow

With Snakemake and Snakedeploy installed, the workflow can be deployed as follows. First, create an appropriate project working directory on your system and enter it:

mkdir -p path/to/project-workdir
cd path/to/project-workdir

In all following steps, we will assume that you are inside of that directory. Then run

snakedeploy deploy-workflow https://github.com/nezapajek/project-tobamo . --tag None

Snakedeploy will create two folders, workflow and config. The former contains the deployment of the chosen workflow as a Snakemake module, the latter contains configuration files which will be modified in the next step in order to configure the workflow to your needs.

Step 3: Configure workflow

To configure the workflow, adapt config/config.yml to your needs following the instructions below.

Step 4: Run workflow

The deployment method is controlled using the --software-deployment-method (short --sdm) argument.

To run the workflow with automatic deployment of all required software via conda/mamba, use

snakemake --cores all --sdm conda

Snakemake will automatically detect the main Snakefile in the workflow subfolder and execute the workflow module that has been defined by the deployment in step 2.

For further options such as cluster and cloud execution, see the docs.

Step 5: Generate report

After finalizing your data analysis, you can automatically generate an interactive visual HTML report for inspection of results together with parameters and code inside of the browser using

snakemake --report report.zip

Configuration

The following section is imported from the workflow’s config/README.md.

Configuration Guide

This directory contains configuration files for the tobamo virus detection workflow.

Configuration Files

`config.yaml`

Main configuration file that specifies:

Sample file location (samples: config/samples_all.tsv)
Additional workflow parameters

samples: config/samples_all.tsv  # Path to sample list
# Add other configuration parameters as needed

Sample Files

Different sample files are provided for various use cases:

File	Description	Samples	Use Case
`samples_debug.tsv`	Debug dataset	2	Troubleshooting
`samples_12.tsv`	Small dataset	12	Development and debugging
`samples_test.tsv`	Test dataset	253	Test samples
`samples_all.tsv`	Complete dataset	279	Test and control samples

Sample File Format

Sample files are tab-separated with a single column header:

samples
SRR1234567
ERR2345678
DRR3456789

Requirements:

First line must be samples (header)
One SRA accession per line
Supported prefixes: SRR, ERR, DRR
No empty lines or comments

Usage Examples

Basic Configuration

Choose appropriate sample file:

# For testing
cp config/samples_test.tsv config/my_samples.tsv

# For production
cp config/samples_all.tsv config/my_samples.tsv

Edit config.yaml:
```
samples: config/my_samples.tsv
```

Custom Sample List

Create custom sample file:

echo "samples" > config/custom_samples.tsv
echo "SRR1234567" >> config/custom_samples.tsv
echo "ERR2345678" >> config/custom_samples.tsv

Update configuration:
```
samples: config/custom_samples.tsv
```

Validation

Before running the workflow, validate your configuration:

# Check sample file format
snakemake -n --configfile config/config.yaml

# Validate specific samples exist in SRA
snakemake --use-conda -n -R download_sra

Advanced Configuration

For advanced users, additional parameters can be added to config.yaml:

samples: config/samples_all.tsv

# Example additional parameters
assembly:
  megahit_memory: 0.9  # Memory fraction for MEGAHIT
  spades_memory: 500   # Memory limit in GB for SPAdes

diamond:
  sensitivity: "ultra-sensitive"  # Diamond sensitivity
  evalue: 1e-5                   # E-value threshold

megan:
  min_score: 50        # Minimum bit score
  max_expected: 0.01   # Maximum expected value

Troubleshooting

Common Configuration Issues

Invalid sample format:
- Ensure header is exactly samples
- Check for extra spaces or tabs
- Verify SRA accession format
File path issues:
- Use relative paths from project root
- Ensure sample files exist before running
Memory configuration:
- Adjust memory settings for your system
- Monitor resource usage during runs

Linting and formatting

Linting results

/tmp/tmpyqubzmq2/workflow/Snakefile:23: SyntaxWarning: invalid escape sequence '\d'
  
Lints for rule all (line 27, /tmp/tmpyqubzmq2/workflow/Snakefile):
    * No log directive defined:
      Without a log directive, all output will be printed to the terminal. In
      distributed environments, this means that errors are harder to discover.
      In local environments, output of concurrent jobs will be mixed and become
      unreadable.
      Also see:
      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
    * Specify a conda environment or container for each rule.:
      This way, the used software for each specific step is documented, and the
      workflow can be executed on any machine without prerequisites.
      Also see:
      https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#integrated-package-management
      https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#running-jobs-in-containers

Formatting results

All tests passed!