milesroberts-123/slim-sweep-cnn

A workflow to simulate selective sweeps in SLiM and turn the results into images

Overview

Latest release: None, Last update: 2025-06-16

Linting: linting: failed, Formatting: formatting: failed

Topics: evolutionary-biology machine-learning population-genetics

Deployment

Step 1: Install Snakemake and Snakedeploy

Snakemake and Snakedeploy are best installed via the Mamba package manager (a drop-in replacement for conda). If you have neither Conda nor Mamba, it is recommended to install Miniforge. More details regarding Mamba can be found here.

When using Mamba, run

mamba create -c conda-forge -c bioconda --name snakemake snakemake snakedeploy

to install both Snakemake and Snakedeploy in an isolated environment. For all following commands ensure that this environment is activated via

conda activate snakemake

Step 2: Deploy workflow

With Snakemake and Snakedeploy installed, the workflow can be deployed as follows. First, create an appropriate project working directory on your system and enter it:

mkdir -p path/to/project-workdir
cd path/to/project-workdir

In all following steps, we will assume that you are inside of that directory. Then run

snakedeploy deploy-workflow https://github.com/milesroberts-123/slim-sweep-cnn . --tag None

Snakedeploy will create two folders, workflow and config. The former contains the deployment of the chosen workflow as a Snakemake module, the latter contains configuration files which will be modified in the next step in order to configure the workflow to your needs.

Step 3: Configure workflow

To configure the workflow, adapt config/config.yml to your needs following the instructions below.

Step 4: Run workflow

The deployment method is controlled using the --software-deployment-method (short --sdm) argument.

To run the workflow using a combination of conda and apptainer/singularity for software deployment, use

snakemake --cores all --sdm conda apptainer

To run the workflow with automatic deployment of all required software via conda/mamba, use

snakemake --cores all --sdm conda

Snakemake will automatically detect the main Snakefile in the workflow subfolder and execute the workflow module that has been defined by the deployment in step 2.

For further options such as cluster and cloud execution, see the docs.

Step 5: Generate report

After finalizing your data analysis, you can automatically generate an interactive visual HTML report for inspection of results together with parameters and code inside of the browser using

snakemake --report report.zip

Configuration

The following section is imported from the workflow’s config/README.md.

1. Configure workflow with `config/config.yaml`

The parameters that are held constant across all simulations in a workflow are in config/config.yaml. These are:

Parameter	Description	Default
K	number of simulations to run	5000
nidv	Number of individual genomes to sample from each simulation	128
nloc	Number of loci to sample from each simulation	128
distMethod	Method for measuring genetic distance between loci	“manhattan”
clustMethod	Method used to cluster genomes based on genetic distance	“complete “

2. Generate table of simulation parameters `config/parameters.tsv`

The parameters that vary across simulations are within config/parameters.tsv. Each row of this file represents a different simulation and each simulation gets a unique number as an ID.

There’s an example simple R script in resources/s00_make_param_table.R that generates a parameters table, but you don’t need to use that script. However you choose to generate a parameters table, it needs to have the following columns:

Parameter	Description
ID	Number from 1:K, used as a unique ID for each simulation
Q	scaling factor
N	ancestral population size, used for burn-in
sweepS	selection coefficient for sweep mutation
h	dominance coefficient of sweep mutation
sigma	selfing rate
mu	mutation rate
R	recombination rate
tau	time when population is sampled (cycles post-burn-in when simulation ends)
kappa	time when sweep is introduced (simulation will restart here if sweep fails)
f0	threshold frequency to convert sweep from neutral -> beneficial (for soft sweeps)
f1	threshold frequency to convert sweep from beneficial -> neutral (for partial sweeps)
n	number of sweep mutations to introduce (recurrent mutation)
lambda	average waiting time between sweep mutations (poisson distribution)
ncf	proportion of cross over events that are gene conversions
cl	length of gene conversion crossover events
fsimple	fraction of crossover events that are simple
B	proportion of non-sweep mutations that are beneficial
U	proportion of non-sweep mutations that are deleterious
M	proportion of non-sweep mutations that are neutral
hU	dominance coefficient for deleterious non-sweep mutations
hB	dominance coefficient for beneficial non-sweep mutations
bBar	average selection coefficient for beneficial non-sweep mutations
uBar	average selection coefficient for deleterious non-sweep mutations
alpha	shape parameter for distribution of fitness effects for deleterious non-sweep mutations
r	logistic growth rate
K	logistic carrying capacity
custom_demography	whether to use custom demography in config/demography.csv or a logistic model

Depending on your parameter choices, you can simulate lots of different sweep types. Here is a table summarizing what parameter values produce what sweep types:

Sweep type	f0	f1	n
hard	0	1	1
soft	0<	1	1
partial	0	<1	1
recurrent	0	1	>1
soft + partial	0<	<1	1
soft + recurrent	0<	1	>1
partial + recurrent	0	<1	>1
soft + partial + recurrent	0<	<1	>1

If your demography follows a logistic growth rate model, then you can simulate a wide range of demographies:

Demography	Description	r	K
constant	Population size does not change	0	N
growth	Population size increases until K	0 < r < 2	N < K
decay	Population size decreases until K	0 < r < 2	N > K
cycle	Population size cycles between two values	2 < r < sqrt(6)	anything
chaotic	Population size changes chaotically*	sqrt(6) < r < 3	anything

Note that for the chaotic demography, because our population sizes are discrete a population that randomly changes back to a size it had during a previous simulation tick will just cycle. So a chaotic demography will probably just be an arbitrarily long cycle in many cases.

3. (Optional) Specify a custom demographic pattern with `config/demography.csv`

For each simulation in config/parameters.tsv you need to define a switch called custom_demography. If custom_demography != 1, then slim will look for r and K values in config/parameters.tsv to use a logistic growth/death model for the population. If custom_demography == 1, then slim will look for config/demography.csv which specifies a cutom demographic pattern. It is a headerless csv file with two columns:

Population size	Time point
Column of population sizes	Column of time points, starting with 1 as the generation after burn-in, at which the population size changes

For example, a file like the following:

1000,10
2000,15
3000,20

means that 10 generations after burn-in the population size will change to 1000 (burn-in population size is defined by N), at 15 generations post-burn-in the population size will change to 2000, and at 20 generations post-burn-in the population size will change to 3000.

Linting and formatting

Linting results

Using workflow specific profile workflow/profiles/default for setting default command line arguments.
usage: snakemake [-h] [--dry-run] [--profile PROFILE]
                 [--workflow-profile WORKFLOW_PROFILE] [--cache [RULE ...]]
                 [--snakefile FILE] [--cores N] [--jobs N] [--local-cores N]
                 [--resources NAME=INT [NAME=INT ...]]
                 [--set-threads RULE=THREADS [RULE=THREADS ...]]
                 [--max-threads MAX_THREADS]
                 [--set-resources RULE:RESOURCE=VALUE [RULE:RESOURCE=VALUE ...]]
                 [--set-scatter NAME=SCATTERITEMS [NAME=SCATTERITEMS ...]]
                 [--set-resource-scopes RESOURCE=[global|local]
                 [RESOURCE=[global|local] ...]]
                 [--default-resources [NAME=INT ...]]
                 [--preemptible-rules [PREEMPTIBLE_RULES ...]]
                 [--preemptible-retries PREEMPTIBLE_RETRIES]
                 [--configfile FILE [FILE ...]] [--config [KEY=VALUE ...]]
                 [--replace-workflow-config] [--envvars VARNAME [VARNAME ...]]
                 [--directory DIR] [--touch] [--keep-going]
                 [--rerun-triggers {code,input,mtime,params,software-env} [{code,input,mtime,params,software-env} ...]]
                 [--force] [--executor {local,dryrun,touch}] [--forceall]
                 [--forcerun [TARGET ...]]
                 [--consider-ancient RULE=INPUTITEMS [RULE=INPUTITEMS ...]]
                 [--prioritize TARGET [TARGET ...]]
                 [--batch RULE=BATCH/BATCHES] [--until TARGET [TARGET ...]]
                 [--omit-from TARGET [TARGET ...]] [--rerun-incomplete]
                 [--shadow-prefix DIR]
                 [--strict-dag-evaluation {cyclic-graph,functions,periodic-wildcards} [{cyclic-graph,functions,periodic-wildcards} ...]]
                 [--scheduler [{ilp,greedy}]]
                 [--scheduler-ilp-solver {COIN_CMD}]
                 [--conda-base-path CONDA_BASE_PATH] [--no-subworkflows]
                 [--precommand PRECOMMAND] [--groups GROUPS [GROUPS ...]]
                 [--group-components GROUP_COMPONENTS [GROUP_COMPONENTS ...]]
                 [--report [FILE]] [--report-after-run]
                 [--report-stylesheet CSSFILE] [--reporter PLUGIN]
                 [--draft-notebook TARGET] [--edit-notebook TARGET]
                 [--notebook-listen IP:PORT] [--lint [{text,json}]]
                 [--generate-unit-tests [TESTPATH]] [--containerize]
                 [--export-cwl FILE] [--list-rules] [--list-target-rules]
                 [--dag [{dot,mermaid-js}]] [--rulegraph [{dot,mermaid-js}]]
                 [--filegraph] [--d3dag] [--summary] [--detailed-summary]
                 [--archive FILE] [--cleanup-metadata FILE [FILE ...]]
                 [--cleanup-shadow] [--skip-script-cleanup] [--unlock]
                 [--list-changes {input,code,params}] [--list-input-changes]
                 [--list-params-changes] [--list-untracked]
                 [--delete-all-output | --delete-temp-output]
                 [--keep-incomplete] [--drop-metadata] [--version]
                 [--printshellcmds] [--debug-dag] [--nocolor]
                 [--quiet [{all,host,progress,reason,rules} ...]]
                 [--print-compilation] [--verbose] [--force-use-threads]
                 [--allow-ambiguity] [--nolock] [--ignore-incomplete]
                 [--max-inventory-time SECONDS] [--trust-io-cache]
                 [--max-checksum-file-size SIZE] [--latency-wait SECONDS]
                 [--wait-for-free-local-storage WAIT_FOR_FREE_LOCAL_STORAGE]
                 [--wait-for-files [FILE ...]] [--wait-for-files-file FILE]
                 [--queue-input-wait-time SECONDS] [--notemp] [--all-temp]
                 [--unneeded-temp-files FILE [FILE ...]]
                 [--keep-storage-local-copies] [--not-retrieve-storage]
                 [--target-files-omit-workdir-adjustment]
                 [--allowed-rules ALLOWED_RULES [ALLOWED_RULES ...]]
                 [--max-jobs-per-timespan MAX_JOBS_PER_TIMESPAN]
                 [--max-jobs-per-second MAX_JOBS_PER_SECOND]
                 [--max-status-checks-per-second MAX_STATUS_CHECKS_PER_SECOND]
                 [--seconds-between-status-checks SECONDS_BETWEEN_STATUS_CHECKS]
                 [--retries RETRIES] [--wrapper-prefix WRAPPER_PREFIX]
                 [--default-storage-provider DEFAULT_STORAGE_PROVIDER]
                 [--default-storage-prefix DEFAULT_STORAGE_PREFIX]
                 [--local-storage-prefix LOCAL_STORAGE_PREFIX]
                 [--remote-job-local-storage-prefix REMOTE_JOB_LOCAL_STORAGE_PREFIX]
                 [--shared-fs-usage {input-output,persistence,software-deployment,source-cache,sources,storage-local-copies,none} [{input-output,persistence,software-deployment,source-cache,sources,storage-local-copies,none} ...]]
                 [--scheduler-greediness SCHEDULER_GREEDINESS]
                 [--scheduler-subsample SCHEDULER_SUBSAMPLE] [--no-hooks]
                 [--debug] [--runtime-profile FILE]
                 [--local-groupid LOCAL_GROUPID] [--attempt ATTEMPT]
                 [--show-failed-logs] [--logger {} [{} ...]]
                 [--job-deploy-sources] [--benchmark-extended]
                 [--container-image IMAGE] [--immediate-submit]
                 [--jobscript SCRIPT] [--jobname NAME] [--flux]
                 [--software-deployment-method {apptainer,conda,env-modules} [{apptainer,conda,env-modules} ...]]
                 [--container-cleanup-images] [--use-conda]
                 [--conda-not-block-search-path-envvars] [--list-conda-envs]
                 [--conda-prefix DIR] [--conda-cleanup-envs]
                 [--conda-cleanup-pkgs [{tarballs,cache}]]
                 [--conda-create-envs-only] [--conda-frontend {conda,mamba}]
                 [--use-apptainer] [--apptainer-prefix DIR]
                 [--apptainer-args ARGS] [--use-envmodules]
                 [--scheduler-solver-path SCHEDULER_SOLVER_PATH]
                 [--deploy-sources QUERY CHECKSUM]
                 [--target-jobs TARGET_JOBS [TARGET_JOBS ...]]
                 [--mode {default,remote,subprocess}]
                 [--report-html-path VALUE]
                 [--report-html-stylesheet-path VALUE]
                 [targets ...]
snakemake: error: argument --executor/-e: invalid choice: 'slurm' (choose from local, dryrun, touch)

Formatting results

[DEBUG] 
[DEBUG] 
[DEBUG] In file "/tmp/tmpa13kciu_/workflow/Snakefile":  Formatted content is different from original
[DEBUG] 
[DEBUG] 
[DEBUG] 
[INFO] 1 file(s) would be changed 😬
[INFO] 4 file(s) would be left unchanged 🎉

snakefmt version: 0.11.0