Snakemake executor plugin: aws-batch
This is the Snakemake plugin for AWS Batch. This plugin is used to distribute Snakemake jobs to AWS Batch EC2 instances.
Installation
Install this plugin by installing it with pip or mamba, e.g.:
pip install snakemake-executor-plugin-aws-batch
Usage
In order to use the plugin, run Snakemake (>=8.0) in the folder where your workflow code and config resides (containing either workflow/Snakefile
or Snakefile
) with the corresponding value for the executor flag:
snakemake --executor aws-batch --default-resources --jobs N ...
with N
being the number of jobs you want to run in parallel and ...
being any additional arguments you want to use (see below).
The machine on which you run Snakemake must have the executor plugin installed, and, depending on the type of the executor plugin, have access to the target service of the executor plugin (e.g. an HPC middleware like slurm with the sbatch
command, or internet access to submit jobs to some cloud provider, e.g. azure).
The flag --default-resources
ensures that Snakemake auto-calculates the mem
and disk
resources for each job, based on the input file size.
The values assumed there are conservative and should usually suffice.
However, you can always override those defaults by specifying the resources in your Snakemake rules or via the --set-resources
flag.
Depending on the executor plugin, you might either rely on a shared local filesystem or use a remote filesystem or storage. For the latter, you have to additionally use a suitable storage plugin (see section storage plugins in the sidebar of this catalog) and eventually check for further recommendations in the sections below.
All arguments can also be persisted via a profile, such that they don’t have to be specified on each invocation. Here, this would mean the following entries inside of the profile
executor: aws-batch
default_resources: []
For specifying other default resources than the built-in ones, see the docs.
Settings
The executor plugin has the following settings (which can be passed via command line, the workflow or environment variables, if provided in the respective columns):
Further details
AWS Credentials
This plugin assumes you have setup AWS CLI credentials in ~/.aws/credentials. For more information see aws cli configuration.
AWS Infrastructure Requirements
The snakemake-executor-plugin-aws-batch requires an EC2 compute environment and a job queue to be configured. The plugin repo contains terraform used to setup the requisite AWS Batch infrastructure.
Assuming you have terraform installed and aws cli credentials configured, you can deploy the required infrastructure as follows:
cd terraform
terraform init
terraform plan
terraform apply
Resource names can be updated by including a terraform.tfvars file that specifies
variable name overrides of the defaults defined in vars.tf. The outputs variables from
running terraform apply can be exported as environment variables for snakemake-executor-plugin-aws-batch to use.
SNAKEMAKE_AWS_BATCH_REGION SNAKEMAKE_AWS_BATCH_JOB_QUEUE SNAKEMAKE_AWS_BATCH_JOB_ROLE
Example
Create environment
Install snakemake and the AWS executor and storage plugins into an environment. We recommend the use of mamba package manager which can be installed using miniforge, but these dependencies can also be installed using pip or other python package managers.
mamba create -n snakemake-example \
snakemake snakemake-storage-plugin-s3 snakemake-executor-plugin-aws-batch
mamba activate snakemake-example
Clone the snakemake tutorial repo containing the example workflow:
git clone https://github.com/snakemake/snakemake-tutorial-data.git
Setup and run tutorial workflow on the the executor
cd snakemake-tutorial-data
export SNAKEMAKE_AWS_BATCH_REGION=
export SNAKEMAKE_AWS_BATCH_JOB_QUEUE=
export SNAKEMAKE_AWS_BATCH_JOB_ROLE=
snakemake --jobs 4 \
--executor aws-batch \
--aws-batch-region us-west-2 \
--default-storage-provider s3 \
--default-storage-prefix s3://snakemake-tutorial-example \
--verbose