ylab-hi/ScanNeo2
Snakemake-based computational workflow for neoantigen prediction from diverse sources
Overview
Latest release: v0.3.14, Last update: 2026-05-25
Share link: https://snakemake.github.io/snakemake-workflow-catalog?wf=ylab-hi/ScanNeo2
Quality control: linting: passed formatting: failed
Topics: epitope gene-fusion indels neoantigens peptide snakemake snakemake-workflow splicing exitron neoepitope immunotherapy vaccine
Wrappers: bio/arriba bio/bcftools/concat bio/bcftools/index bio/bowtie2/align bio/bowtie2/build bio/bwa/index bio/fastp bio/fastqc bio/gatk/applybqsr bio/gatk/applyvqsr bio/gatk/baserecalibrator bio/gatk/filtermutectcalls bio/gatk/haplotypecaller bio/gatk/mutect bio/gatk/selectvariants bio/gatk/variantrecalibrator bio/picard/createsequencedictionary bio/samtools/faidx bio/samtools/index bio/samtools/merge bio/star/align bio/star/index bio/tabix/index bio/vep/annotate
Deployment
Step 1: Install Snakemake and Snakedeploy
Snakemake and Snakedeploy are best installed via the Conda package manager. It is recommended to install conda via Miniforge. Run
conda create -c conda-forge -c bioconda -c nodefaults --name snakemake snakemake snakedeploy
to install both Snakemake and Snakedeploy in an isolated environment. For all following commands ensure that this environment is activated via
conda activate snakemake
For other installation methods, refer to the Snakemake and Snakedeploy documentation.
Step 2: Deploy workflow
With Snakemake and Snakedeploy installed, the workflow can be deployed as follows. First, create an appropriate project working directory on your system and enter it:
mkdir -p path/to/project-workdir
cd path/to/project-workdir
In all following steps, we will assume that you are inside of that directory. Then run
snakedeploy deploy-workflow https://github.com/ylab-hi/ScanNeo2 . --tag v0.3.14
Snakedeploy will create two folders, workflow and config. The former contains the deployment of the chosen workflow as a Snakemake module, the latter contains configuration files which will be modified in the next step in order to configure the workflow to your needs.
Step 3: Configure workflow
To configure the workflow, adapt config/config.yml to your needs following the instructions below.
Step 4: Run workflow
The deployment method is controlled using the --software-deployment-method (short --sdm) argument.
To run the workflow with automatic deployment of all required software via conda/mamba, use
snakemake --cores all --sdm conda
Snakemake will automatically detect the main Snakefile in the workflow subfolder and execute the workflow module that has been defined by the deployment in step 2.
For further options such as cluster and cloud execution, see the docs.
Step 5: Generate report
After finalizing your data analysis, you can automatically generate an interactive visual HTML report for inspection of results together with parameters and code inside of the browser using
snakemake --report report.zip
Configuration
The following section is imported from the workflow’s config/README.md.
Configuring ScanNeo2
ScanNeo2 is configured through a single YAML file, config/config.yaml. The
default file ships with sensible values for every section; you typically only
need to edit the data: section to point at your inputs.
Run the workflow with the default configuration:
snakemake --cores all --software-deployment-method conda
…or with a custom configuration file at any path:
snakemake --cores all --software-deployment-method conda \
--configfile /path/to/my-config.yaml
Paths in the config file are resolved relative to the directory from which
you invoke snakemake.
The sections below mirror the structure of config/config.yaml. The schema is
defined in workflow/schemas/config.schema.yaml;
its parameter table is rendered automatically by the Snakemake Workflow Catalog.
reference — reference genome & annotation
Key |
Type |
Default |
Description |
|---|---|---|---|
|
int |
|
Ensembl release used to download the reference genome and annotation. |
|
bool |
|
Include non-chromosomal / scaffold contigs in the reference. |
threads, mapq, basequal — global limits
Key |
Type |
Default |
Description |
|---|---|---|---|
|
int |
|
Upper bound on threads any single rule may use. Effective threads per rule are |
|
int |
|
Minimum read mapping quality (MAPQ, Phred-scaled) used during filtering. |
|
int |
|
Minimum base-call quality (Phred-scaled) used during filtering. |
data — input samples
The data section is the only block that must be edited.
Key |
Type |
Description |
|---|---|---|
|
str |
Run name. Results are written to |
|
mapping |
DNA-seq inputs, one entry per group. Key = group name; value = one path (single-end) or two space-separated paths (paired-end). Accepted extensions: |
|
mapping |
RNA-seq inputs, same format as |
|
str |
Group name of the matched normal/control sample (used for somatic calling). Leave empty if no matched normal exists. |
|
path |
Optional path to a user-supplied VCF to prioritize directly (bypassing variant calling). |
|
path |
Optional path to a file listing MHC class I alleles (used when |
|
path |
Optional path to a file listing MHC class II alleles (used when |
Example data: block:
data:
name: my_run
dnaseq:
tumor: /path/to/dna_tumor_R1.fq.gz /path/to/dna_tumor_R2.fq.gz
normal: /path/to/dna_normal.bam
rnaseq:
tumor: /path/to/rna_tumor_R1.fq.gz /path/to/rna_tumor_R2.fq.gz
normal: normal
preproc — fastq pre-processing
Applied only to FASTQ inputs (BAM inputs are not re-trimmed).
Key |
Type |
Default |
Description |
|---|---|---|---|
|
bool |
|
Whether to run pre-processing. |
|
int |
|
Discard reads shorter than this (bp) after trimming. |
|
bool |
|
Enable sliding-window quality trimming. |
|
int |
|
Sliding-window size (bp). |
|
int |
|
Mean base quality required within the window (Phred-scaled). |
align — STAR chimeric alignment
Parameters passed to STAR for RNA-seq gene-fusion detection.
Key |
Type |
Default |
Description |
|---|---|---|---|
|
int |
|
Minimum length of each chimeric segment (0 disables chimeric detection). |
|
int |
|
Minimum total chimeric-alignment score. |
|
int |
|
Minimum overhang length for a chimeric junction. |
|
int |
|
Maximum allowed drop of chimeric score below read length. |
|
int |
|
Minimum score separation between best and next-best chimeric alignment. |
altsplicing — alternative splicing events
Key |
Type |
Default |
Description |
|---|---|---|---|
|
bool |
|
Include alternative-splicing events. |
|
int |
|
Confidence level (1–3) for filtering input alignments. |
|
int |
|
Number of intron-edge addition iterations (sensitivity vs runtime). |
|
int |
|
Maximum edges in the splice graph. |
exitronsplicing — exitron splicing events
Key |
Type |
Default |
Description |
|---|---|---|---|
|
bool |
|
Include exitron-splicing events. |
|
int |
|
Allele observation count. |
|
float |
|
Percent spliced-out. |
|
int |
|
Library strand specificity (0=unstranded, 1=forward, 2=reverse). |
genefusion — Arriba gene-fusion calling
Key |
Type |
Default |
Description |
|---|---|---|---|
|
bool |
|
Include gene-fusion events. |
|
float |
|
Maximum E-value. |
|
int |
|
Minimum supporting reads (fusions below this are discarded). |
|
int |
|
Maximum supporting reads. |
|
float |
|
Genes with identity above this fraction are treated as homologs and discarded. |
|
int |
|
Remove breakpoints adjacent to homopolymers of this length. |
|
int |
|
Distance (bp) below which adjacent breakpoints are considered read-through. |
|
int |
|
Discard fusions whose segments are shorter than this. |
|
int |
|
Fusions between genes require at least this many spliced breakpoints. |
|
float |
|
Remove reads where a repetitive 3-mer makes up more than this fraction. |
|
int |
|
Mean fragment length. |
|
float |
|
Maximum mismatch fraction. |
indel — small variant calling
Key |
Type |
Default |
Description |
|---|---|---|---|
|
bool |
|
Include indels & SNVs. |
|
str |
|
|
|
str |
|
Call from |
|
str |
|
Posterior-probability threshold strategy: |
|
float |
|
Relative weight of recall to precision (used with |
|
float |
|
False-discovery rate target (used with |
|
int |
|
Minimum reference bases to suspect a slippage event. |
|
float |
|
Suspected slippage frequency. |
quantification — expression quantification
Key |
Type |
Default |
Description |
|---|---|---|---|
|
str |
|
Quantify from |
hlatyping — HLA typing
Key |
Type |
Default |
Description |
|---|---|---|---|
|
str |
|
HLA class to type: |
|
str |
|
Source(s) for MHC-I typing: |
|
str |
|
Source(s) for MHC-II typing. |
|
path |
|
HLA-HD frequency-data directory (only required when class II is enabled). |
|
path |
|
HLA-HD gene-split file. |
|
path |
|
HLA-HD dictionary directory. |
HLA-HD must be installed and on PATH if class II is enabled — see the
top-level README for installation notes.
prioritization — epitope prediction
Key |
Type |
Default |
Description |
|---|---|---|---|
|
str |
|
Epitope MHC class to predict: |
|
str |
|
Comma-separated MHC-I epitope lengths (aa). |
|
str |
|
Comma-separated MHC-II epitope lengths (aa). |
For tutorials and worked examples, see the ScanNeo2 wiki.
Workflow parameters
The following table is automatically parsed from the workflow’s config.schema.y(a)ml file.
Parameter |
Type |
Description |
Required |
Default |
|---|---|---|---|---|
reference |
yes |
|||
. release |
integer |
Ensembl release used to download the reference genome and annotation |
yes |
|
. nonchr |
boolean |
whether to include non-chromosomal/scaffold contigs |
yes |
|
threads |
integer |
maximum number of threads any single rule may use |
yes |
|
mapq |
integer |
minimum read mapping quality (MAPQ) |
yes |
|
basequal |
integer |
minimum base-call quality (Phred-scaled) |
yes |
|
data |
yes |
|||
. name |
string |
run name; results are written to results/ |
yes |
|
. dnaseq |
[‘object’, ‘null’] |
DNA-seq inputs, one entry per group (group name -> read path(s)) |
yes |
|
. rnaseq |
[‘object’, ‘null’] |
RNA-seq inputs, one entry per group (group name -> read path(s)) |
yes |
|
. normal |
[‘string’, ‘null’] |
group name(s) of the matched normal/control sample |
yes |
|
. custom |
yes |
|||
. . variants |
[‘string’, ‘null’] |
optional path to a user-supplied VCF to prioritize directly |
yes |
|
. . hlatyping |
yes |
|||
. . . MHC-I |
[‘string’, ‘null’] |
yes |
||
. . . MHC-II |
[‘string’, ‘null’] |
yes |
||
preproc |
yes |
|||
. activate |
boolean |
yes |
||
. minlen |
integer |
discard reads shorter than this (bp) after trimming |
yes |
|
. slidingwindow |
yes |
|||
. . activate |
boolean |
yes |
||
. . wsize |
integer |
yes |
||
. . wqual |
integer |
yes |
||
align |
yes |
|||
. chimSegmentMin |
integer |
yes |
||
. chimScoreMin |
integer |
yes |
||
. chimJunctionOverhangMin |
integer |
yes |
||
. chimScoreDropMax |
integer |
yes |
||
. chimScoreSeparation |
integer |
yes |
||
altsplicing |
yes |
|||
. activate |
boolean |
yes |
||
. confidence |
integer |
yes |
||
. iterations |
integer |
yes |
||
. edgelimit |
integer |
yes |
||
exitronsplicing |
yes |
|||
. activate |
boolean |
yes |
||
. ao |
integer |
yes |
||
. pso |
number |
yes |
||
. strand |
integer |
yes |
||
genefusion |
yes |
|||
. activate |
boolean |
yes |
||
. maxevalue |
number |
yes |
||
. suppreads |
integer |
yes |
||
. maxsuppreads |
integer |
yes |
||
. maxidentity |
number |
yes |
||
. hpolymerlen |
integer |
yes |
||
. readthroughdist |
integer |
yes |
||
. minanchorlen |
integer |
yes |
||
. splicedevents |
integer |
yes |
||
. maxkmer |
number |
yes |
||
. fraglen |
integer |
yes |
||
. maxmismatch |
number |
yes |
||
indel |
yes |
|||
. activate |
boolean |
yes |
||
. type |
string |
yes |
||
. mode |
string |
yes |
||
. strategy |
string |
yes |
||
. fscorebeta |
number |
yes |
||
. fdr |
number |
yes |
||
. sliplen |
integer |
yes |
||
. sliprate |
number |
yes |
||
quantification |
yes |
|||
. mode |
string |
yes |
||
hlatyping |
yes |
|||
. class |
string |
yes |
||
. MHC-I_mode |
string |
DNA, RNA, or custom (comma-separated combinations allowed) |
yes |
|
. MHC-II_mode |
string |
yes |
||
. freqdata |
string |
yes |
||
. split |
string |
yes |
||
. dict |
string |
yes |
||
prioritization |
yes |
|||
. class |
string |
yes |
||
. lengths |
yes |
|||
. . MHC-I |
string |
comma-separated epitope lengths |
yes |
|
. . MHC-II |
string |
yes |
Linting and formatting
Linting results
All tests passed!
Formatting results
1[DEBUG]
2[DEBUG]
3[DEBUG]
4[DEBUG]
5[DEBUG]
6[DEBUG]
7[DEBUG]
8[DEBUG]
9[DEBUG] In file "/tmp/tmp8ow8f99y/ylab-hi-ScanNeo2-8728c91/workflow/rules/indel.smk": Formatted content is different from original
10[DEBUG]
11[DEBUG] In file "/tmp/tmp8ow8f99y/ylab-hi-ScanNeo2-8728c91/workflow/rules/align.smk": Formatted content is different from original
12[DEBUG]
13[DEBUG]
14[DEBUG]
15[DEBUG]
16[DEBUG]
17[DEBUG]
18[DEBUG]
19[DEBUG]
20[INFO] 2 file(s) would be changed 😬
21[INFO] 15 file(s) would be left unchanged 🎉
22
23snakefmt version: 0.11.5