Snakemake executor plugin: aws-batch

https://img.shields.io/badge/repository-github-blue?color=%23022c22

https://img.shields.io/badge/author-jakevc-purple?color=%23064e3b

This is the Snakemake plugin for AWS Batch. This plugin is used to distribute Snakemake jobs to AWS Batch EC2 instances.

Installation

Install this plugin by installing it with pip or mamba, e.g.:

pip install snakemake-executor-plugin-aws-batch

Usage

In order to use the plugin, run Snakemake (>=8.0) in the folder where your workflow code and config resides (containing either workflow/Snakefile or Snakefile) with the corresponding value for the executor flag:

snakemake --executor aws-batch --default-resources --jobs N ...

with N being the number of jobs you want to run in parallel and ... being any additional arguments you want to use (see below). The machine on which you run Snakemake must have the executor plugin installed, and, depending on the type of the executor plugin, have access to the target service of the executor plugin (e.g. an HPC middleware like slurm with the sbatch command, or internet access to submit jobs to some cloud provider, e.g. azure).

The flag --default-resources ensures that Snakemake auto-calculates the mem and disk resources for each job, based on the input file size. The values assumed there are conservative and should usually suffice. However, you can always override those defaults by specifying the resources in your Snakemake rules or via the --set-resources flag.

Depending on the executor plugin, you might either rely on a shared local filesystem or use a remote filesystem or storage. For the latter, you have to additionally use a suitable storage plugin (see section storage plugins in the sidebar of this catalog) and eventually check for further recommendations in the sections below.

All arguments can also be persisted via a profile, such that they don’t have to be specified on each invocation. Here, this would mean the following entries inside of the profile

executor: aws-batch
default_resources: []

For specifying other default resources than the built-in ones, see the docs.

Settings

The executor plugin has the following settings (which can be passed via command line, the workflow or environment variables, if provided in the respective columns):

Settings
CLI argument	Description	Default	Required
`--aws-batch-region VALUE`	AWS Region	`None`	✓
`--aws-batch-job-queue VALUE`	The AWS Batch task queue ARN used for running tasks	`None`	✓
`--aws-batch-job-role VALUE`	The AWS job role ARN that is used for running the tasks	`None`	✓
`--aws-batch-tags VALUE`	The tags that should be applied to all of the batch tasks,of the form KEY=VALUE	`None`	✗
`--aws-batch-task-timeout VALUE`	Task timeout (seconds) will force AWS Batch to terminate a Batch task if it fails to finish within the timeout, minimum 60	`300`	✗

Further details

AWS Credentials

This plugin assumes you have setup AWS CLI credentials in ~/.aws/credentials. For more information see aws cli configuration.

AWS Infrastructure Requirements

The snakemake-executor-plugin-aws-batch requires an EC2 compute environment and a job queue to be configured. The plugin repo contains terraform used to setup the requisite AWS Batch infrastructure.

Assuming you have terraform installed and aws cli credentials configured, you can deploy the required infrastructure as follows:

cd terraform
terraform init
terraform plan
terraform apply

Resource names can be updated by including a terraform.tfvars file that specifies variable name overrides of the defaults defined in vars.tf. The outputs variables from
running terraform apply can be exported as environment variables for snakemake-executor-plugin-aws-batch to use.

SNAKEMAKE_AWS_BATCH_REGION SNAKEMAKE_AWS_BATCH_JOB_QUEUE SNAKEMAKE_AWS_BATCH_JOB_ROLE

Example

Create environment

Install snakemake and the AWS executor and storage plugins into an environment. We recommend the use of mamba package manager which can be installed using miniforge, but these dependencies can also be installed using pip or other python package managers.

mamba create -n snakemake-example \
    snakemake snakemake-storage-plugin-s3 snakemake-executor-plugin-aws-batch
mamba activate snakemake-example

Clone the snakemake tutorial repo containing the example workflow:

git clone https://github.com/snakemake/snakemake-tutorial-data.git

Setup and run tutorial workflow on the the executor

cd snakemake-tutorial-data

export SNAKEMAKE_AWS_BATCH_REGION=
export SNAKEMAKE_AWS_BATCH_JOB_QUEUE=
export SNAKEMAKE_AWS_BATCH_JOB_ROLE=

snakemake --jobs 4 \
    --executor aws-batch \
    --aws-batch-region us-west-2 \
    --default-storage-provider s3 \
    --default-storage-prefix s3://snakemake-tutorial-example \
    --verbose