milesroberts-123/slim-sweep-cnn
A workflow to simulate selective sweeps in SLiM and turn the results into images
Overview
Topics: evolutionary-biology machine-learning population-genetics
Latest release: None, Last update: 2025-06-16
Linting: linting: failed, Formatting: formatting: failed
Deployment
Step 1: Install Snakemake and Snakedeploy
Snakemake and Snakedeploy are best installed via the Mamba package manager (a drop-in replacement for conda). If you have neither Conda nor Mamba, it is recommended to install Miniforge. More details regarding Mamba can be found here.
When using Mamba, run
mamba create -c conda-forge -c bioconda --name snakemake snakemake snakedeploy
to install both Snakemake and Snakedeploy in an isolated environment. For all following commands ensure that this environment is activated via
conda activate snakemake
Step 2: Deploy workflow
With Snakemake and Snakedeploy installed, the workflow can be deployed as follows. First, create an appropriate project working directory on your system and enter it:
mkdir -p path/to/project-workdir
cd path/to/project-workdir
In all following steps, we will assume that you are inside of that directory. Then run
snakedeploy deploy-workflow https://github.com/milesroberts-123/slim-sweep-cnn . --tag None
Snakedeploy will create two folders, workflow
and config
. The former contains the deployment of the chosen workflow as a Snakemake module, the latter contains configuration files which will be modified in the next step in order to configure the workflow to your needs.
Step 3: Configure workflow
To configure the workflow, adapt config/config.yml
to your needs following the instructions below.
Step 4: Run workflow
The deployment method is controlled using the --software-deployment-method
(short --sdm
) argument.
To run the workflow using a combination of conda
and apptainer
/singularity
for software deployment, use
snakemake --cores all --sdm conda apptainer
To run the workflow with automatic deployment of all required software via conda
/mamba
, use
snakemake --cores all --sdm conda
Snakemake will automatically detect the main Snakefile
in the workflow
subfolder and execute the workflow module that has been defined by the deployment in step 2.
For further options such as cluster and cloud execution, see the docs.
Step 5: Generate report
After finalizing your data analysis, you can automatically generate an interactive visual HTML report for inspection of results together with parameters and code inside of the browser using
snakemake --report report.zip
Configuration
The following section is imported from the workflow’s config/README.md
.
1. Configure workflow with config/config.yaml
The parameters that are held constant across all simulations in a workflow are in config/config.yaml
. These are:
Parameter |
Description |
Default |
---|---|---|
K |
number of simulations to run |
5000 |
nidv |
Number of individual genomes to sample from each simulation |
128 |
nloc |
Number of loci to sample from each simulation |
128 |
distMethod |
Method for measuring genetic distance between loci |
“manhattan” |
clustMethod |
Method used to cluster genomes based on genetic distance |
“complete “ |
2. Generate table of simulation parameters config/parameters.tsv
The parameters that vary across simulations are within config/parameters.tsv
. Each row of this file represents a different simulation and each simulation gets a unique number as an ID.
There’s an example simple R script in resources/s00_make_param_table.R
that generates a parameters table, but you don’t need to use that script. However you choose to generate a parameters table, it needs to have the following columns:
Parameter |
Description |
---|---|
ID |
Number from 1:K, used as a unique ID for each simulation |
Q |
scaling factor |
N |
ancestral population size, used for burn-in |
sweepS |
selection coefficient for sweep mutation |
h |
dominance coefficient of sweep mutation |
sigma |
selfing rate |
mu |
mutation rate |
R |
recombination rate |
tau |
time when population is sampled (cycles post-burn-in when simulation ends) |
kappa |
time when sweep is introduced (simulation will restart here if sweep fails) |
f0 |
threshold frequency to convert sweep from neutral -> beneficial (for soft sweeps) |
f1 |
threshold frequency to convert sweep from beneficial -> neutral (for partial sweeps) |
n |
number of sweep mutations to introduce (recurrent mutation) |
lambda |
average waiting time between sweep mutations (poisson distribution) |
ncf |
proportion of cross over events that are gene conversions |
cl |
length of gene conversion crossover events |
fsimple |
fraction of crossover events that are simple |
B |
proportion of non-sweep mutations that are beneficial |
U |
proportion of non-sweep mutations that are deleterious |
M |
proportion of non-sweep mutations that are neutral |
hU |
dominance coefficient for deleterious non-sweep mutations |
hB |
dominance coefficient for beneficial non-sweep mutations |
bBar |
average selection coefficient for beneficial non-sweep mutations |
uBar |
average selection coefficient for deleterious non-sweep mutations |
alpha |
shape parameter for distribution of fitness effects for deleterious non-sweep mutations |
r |
logistic growth rate |
K |
logistic carrying capacity |
custom_demography |
whether to use custom demography in config/demography.csv or a logistic model |
Depending on your parameter choices, you can simulate lots of different sweep types. Here is a table summarizing what parameter values produce what sweep types:
Sweep type |
f0 |
f1 |
n |
---|---|---|---|
hard |
0 |
1 |
1 |
soft |
0< |
1 |
1 |
partial |
0 |
<1 |
1 |
recurrent |
0 |
1 |
>1 |
soft + partial |
0< |
<1 |
1 |
soft + recurrent |
0< |
1 |
>1 |
partial + recurrent |
0 |
<1 |
>1 |
soft + partial + recurrent |
0< |
<1 |
>1 |
If your demography follows a logistic growth rate model, then you can simulate a wide range of demographies:
Demography |
Description |
r |
K |
---|---|---|---|
constant |
Population size does not change |
0 |
N |
growth |
Population size increases until K |
0 < r < 2 |
N < K |
decay |
Population size decreases until K |
0 < r < 2 |
N > K |
cycle |
Population size cycles between two values |
2 < r < sqrt(6) |
anything |
chaotic |
Population size changes chaotically* |
sqrt(6) < r < 3 |
anything |
Note that for the chaotic demography, because our population sizes are discrete a population that randomly changes back to a size it had during a previous simulation tick will just cycle. So a chaotic demography will probably just be an arbitrarily long cycle in many cases.
3. (Optional) Specify a custom demographic pattern with config/demography.csv
For each simulation in config/parameters.tsv
you need to define a switch called custom_demography
. If custom_demography != 1
, then slim will look for r and K values in config/parameters.tsv
to use a logistic growth/death model for the population. If custom_demography == 1
, then slim will look for config/demography.csv
which specifies a cutom demographic pattern. It is a headerless csv file with two columns:
Population size |
Time point |
---|---|
Column of population sizes |
Column of time points, starting with 1 as the generation after burn-in, at which the population size changes |
For example, a file like the following:
1000,10
2000,15
3000,20
means that 10 generations after burn-in the population size will change to 1000 (burn-in population size is defined by N), at 15 generations post-burn-in the population size will change to 2000, and at 20 generations post-burn-in the population size will change to 3000.
Linting and formatting
Linting results
1Using workflow specific profile workflow/profiles/default for setting default command line arguments.
2usage: snakemake [-h] [--dry-run] [--profile PROFILE]
3 [--workflow-profile WORKFLOW_PROFILE] [--cache [RULE ...]]
4 [--snakefile FILE] [--cores N] [--jobs N] [--local-cores N]
5 [--resources NAME=INT [NAME=INT ...]]
6 [--set-threads RULE=THREADS [RULE=THREADS ...]]
7 [--max-threads MAX_THREADS]
8 [--set-resources RULE:RESOURCE=VALUE [RULE:RESOURCE=VALUE ...]]
9 [--set-scatter NAME=SCATTERITEMS [NAME=SCATTERITEMS ...]]
10 [--set-resource-scopes RESOURCE=[global|local]
11 [RESOURCE=[global|local] ...]]
12 [--default-resources [NAME=INT ...]]
13 [--preemptible-rules [PREEMPTIBLE_RULES ...]]
14 [--preemptible-retries PREEMPTIBLE_RETRIES]
15 [--configfile FILE [FILE ...]] [--config [KEY=VALUE ...]]
16 [--replace-workflow-config] [--envvars VARNAME [VARNAME ...]]
17 [--directory DIR] [--touch] [--keep-going]
18 [--rerun-triggers {code,input,mtime,params,software-env} [{code,input,mtime,params,software-env} ...]]
19 [--force] [--executor {local,dryrun,touch}] [--forceall]
20 [--forcerun [TARGET ...]]
21 [--consider-ancient RULE=INPUTITEMS [RULE=INPUTITEMS ...]]
22 [--prioritize TARGET [TARGET ...]]
23 [--batch RULE=BATCH/BATCHES] [--until TARGET [TARGET ...]]
24 [--omit-from TARGET [TARGET ...]] [--rerun-incomplete]
25 [--shadow-prefix DIR]
26 [--strict-dag-evaluation {cyclic-graph,functions,periodic-wildcards} [{cyclic-graph,functions,periodic-wildcards} ...]]
27 [--scheduler [{ilp,greedy}]]
28 [--scheduler-ilp-solver {COIN_CMD}]
29 [--conda-base-path CONDA_BASE_PATH] [--no-subworkflows]
30 [--precommand PRECOMMAND] [--groups GROUPS [GROUPS ...]]
31 [--group-components GROUP_COMPONENTS [GROUP_COMPONENTS ...]]
32 [--report [FILE]] [--report-after-run]
33 [--report-stylesheet CSSFILE] [--reporter PLUGIN]
34 [--draft-notebook TARGET] [--edit-notebook TARGET]
35 [--notebook-listen IP:PORT] [--lint [{text,json}]]
36 [--generate-unit-tests [TESTPATH]] [--containerize]
37 [--export-cwl FILE] [--list-rules] [--list-target-rules]
38 [--dag [{dot,mermaid-js}]] [--rulegraph [{dot,mermaid-js}]]
39 [--filegraph] [--d3dag] [--summary] [--detailed-summary]
40 [--archive FILE] [--cleanup-metadata FILE [FILE ...]]
41 [--cleanup-shadow] [--skip-script-cleanup] [--unlock]
42 [--list-changes {input,code,params}] [--list-input-changes]
43 [--list-params-changes] [--list-untracked]
44 [--delete-all-output | --delete-temp-output]
45 [--keep-incomplete] [--drop-metadata] [--version]
46 [--printshellcmds] [--debug-dag] [--nocolor]
47 [--quiet [{all,host,progress,reason,rules} ...]]
48 [--print-compilation] [--verbose] [--force-use-threads]
49 [--allow-ambiguity] [--nolock] [--ignore-incomplete]
50 [--max-inventory-time SECONDS] [--trust-io-cache]
51 [--max-checksum-file-size SIZE] [--latency-wait SECONDS]
52 [--wait-for-free-local-storage WAIT_FOR_FREE_LOCAL_STORAGE]
53 [--wait-for-files [FILE ...]] [--wait-for-files-file FILE]
54 [--queue-input-wait-time SECONDS] [--notemp] [--all-temp]
55 [--unneeded-temp-files FILE [FILE ...]]
56 [--keep-storage-local-copies] [--not-retrieve-storage]
57 [--target-files-omit-workdir-adjustment]
58 [--allowed-rules ALLOWED_RULES [ALLOWED_RULES ...]]
59 [--max-jobs-per-timespan MAX_JOBS_PER_TIMESPAN]
60 [--max-jobs-per-second MAX_JOBS_PER_SECOND]
61 [--max-status-checks-per-second MAX_STATUS_CHECKS_PER_SECOND]
62 [--seconds-between-status-checks SECONDS_BETWEEN_STATUS_CHECKS]
63 [--retries RETRIES] [--wrapper-prefix WRAPPER_PREFIX]
64 [--default-storage-provider DEFAULT_STORAGE_PROVIDER]
65 [--default-storage-prefix DEFAULT_STORAGE_PREFIX]
66 [--local-storage-prefix LOCAL_STORAGE_PREFIX]
67 [--remote-job-local-storage-prefix REMOTE_JOB_LOCAL_STORAGE_PREFIX]
68 [--shared-fs-usage {input-output,persistence,software-deployment,source-cache,sources,storage-local-copies,none} [{input-output,persistence,software-deployment,source-cache,sources,storage-local-copies,none} ...]]
69 [--scheduler-greediness SCHEDULER_GREEDINESS]
70 [--scheduler-subsample SCHEDULER_SUBSAMPLE] [--no-hooks]
71 [--debug] [--runtime-profile FILE]
72 [--local-groupid LOCAL_GROUPID] [--attempt ATTEMPT]
73 [--show-failed-logs] [--logger {} [{} ...]]
74 [--job-deploy-sources] [--benchmark-extended]
75 [--container-image IMAGE] [--immediate-submit]
76 [--jobscript SCRIPT] [--jobname NAME] [--flux]
77 [--software-deployment-method {apptainer,conda,env-modules} [{apptainer,conda,env-modules} ...]]
78 [--container-cleanup-images] [--use-conda]
79 [--conda-not-block-search-path-envvars] [--list-conda-envs]
80 [--conda-prefix DIR] [--conda-cleanup-envs]
81 [--conda-cleanup-pkgs [{tarballs,cache}]]
82 [--conda-create-envs-only] [--conda-frontend {conda,mamba}]
83 [--use-apptainer] [--apptainer-prefix DIR]
84 [--apptainer-args ARGS] [--use-envmodules]
85 [--scheduler-solver-path SCHEDULER_SOLVER_PATH]
86 [--deploy-sources QUERY CHECKSUM]
87 [--target-jobs TARGET_JOBS [TARGET_JOBS ...]]
88 [--mode {default,remote,subprocess}]
89 [--report-html-path VALUE]
90 [--report-html-stylesheet-path VALUE]
91 [targets ...]
92snakemake: error: argument --executor/-e: invalid choice: 'slurm' (choose from local, dryrun, touch)
Formatting results
1[DEBUG]
2[DEBUG]
3[DEBUG] In file "/tmp/tmpa13kciu_/workflow/Snakefile": Formatted content is different from original
4[DEBUG]
5[DEBUG]
6[DEBUG]
7[INFO] 1 file(s) would be changed 😬
8[INFO] 4 file(s) would be left unchanged 🎉
9
10snakefmt version: 0.11.0