Snakemake executor plugin: azure-batch

https://img.shields.io/badge/repository-github-blue?color=%23022c22 https://img.shields.io/badge/author-Jake%20VanCampen-purple?color=%23064e3b PyPI - Version PyPI - License

This is the Snakemake executor for Azure Batch. This plugin is used to distribute Snakemake tasks to a pool of Azure Batch compute node(s).

Installation

Install this plugin by installing it with pip or mamba, e.g.:

pip install snakemake-executor-plugin-azure-batch

Usage

In order to use the plugin, run Snakemake (>=8.0) with the corresponding value for the executor flag:

snakemake --executor azure-batch ...

with ... being any additional arguments you want to use.

The executor plugin has the following settings:

Settings

CLI argument

Description

Default

Choices

Required

Type

--azure-batch-account-url VALUE

Batch account url: https://<account>.<region>.batch.azure.com

None

--azure-batch-autoscale VALUE

Enable autoscaling of the azure batch pool nodes, this option will set the initial dedicated node count to zero, and requires five minutes to resize the cluster, so is only recommended for longer running workflows.

False

--azure-batch-container-registry-pass VALUE

Azure container registry password.

None

--azure-batch-container-registry-url VALUE

Azure container registry url.

None

--azure-batch-container-registry-user VALUE

Azure container registry user.

None

--azure-batch-keep-pool VALUE

Keep the Azure Batch resources after the workflow finished.

False

--azure-batch-managed-identity-resource-id VALUE

Azure Managed Identity resource id. Managed identity is used forauthentication of the Azure Batch nodes to other Azure resources. Requiredif using the Snakemake Azure Storage plugin or if you need access to Azure Container registry from the nodes.

None

--azure-batch-managed-identity-client-id VALUE

Azure Managed Identity client id.

None

--azure-batch-node-start-task-url VALUE

Azure Batch node start task bash script url.This can be any url that hosts your start task bash script. Azure blob SASurls work nicely here

None

--azure-batch-node-fill-type VALUE

Azure batch node fill type.

'spread'

--azure-batch-node-communication-mode VALUE

Azure Batch node communication mode.

None

--azure-batch-pool-subnet-id VALUE

Azure Batch pool subnet id.

None

--azure-batch-pool-image-publisher VALUE

Batch pool image publisher.

'microsoft-azure-batch'

--azure-batch-pool-image-offer VALUE

Batch pool image offer.

'ubuntu-server-container'

--azure-batch-pool-image-sku VALUE

Batch pool image sku.

'20-04-lts'

--azure-batch-pool-vm-node-agent-sku-id VALUE

Azure batch pool vm node agent sku id.

'batch.node.ubuntu 20.04'

--azure-batch-pool-vm-size VALUE

Azure batch pool vm size.

'Standard_D2_v3'

--azure-batch-pool-node-count VALUE

Azure batch pool node count.

1

--azure-batch-resource-group-name VALUE

The name of the Azure Resource Group containing the Azure Batch Account

None

--azure-batch-subscription-id VALUE

The Azure Subscription ID of the Azure Batch Account

None

--azure-batch-tasks-per-node VALUE

Batch tasks per node. If node count is greater than 1, this optionhelps optimize the number of tasks each node can handle simultaneously.

1

Further details

Azure Batch Authentication

The plugin uses DefaultAzureCredential to create and destroy Azure Batch resources. The caller must have Contributor permissions on the Azure Batch account for the plugin to work properly. If you are using the Azure Storage plugin you should also have the Storage Blob Data Contributor role for the storage account(s) you use.

To run a Snakemake workflow using your azure identity you need to ensure you are logged in using the Azure CLI:

az login

If you are running Snakemake from a GitHub workflow, you can authenticate the GitHub runner with a User-Assigned Managed Identity, and grant that Managed Identity Contributor permissions to the Azure Batch Account.

When using the Snakemake storage plugin for azure, or if you have tasks that need access to the Azure Container Registry or other Azure resources, it is required to setup a user assigned managed identity with the executor. The Batch nodes will assume this identity at runtime, and you can grant them permissions to Azure resources using this identity.

The most common role to grant the Managed Identity will be Storage Blob Data Contributor Role for any storage account you want to read/write data from the Azure Batch nodes.

Setup

The following required parameters are used to setup the executor, the can either be passed with their environment variable or command line flag forms.

ENVIRONMENT_VAR

CLI_FLAG

REQUIRED

SNAKEMAKE_AZURE_BATCH_ACCOUNT_URL

–azure-batch-account-url

True

SNAKEMAKE_AZURE_BATCH_SUBSCRIPTION_ID

–azure-batch-subscription-id

True

SNAKEMAKE_AZURE_BATCH_RESOURCE_GROUP_NAME

–azure-batch-resource-group-name

True

The remaining options are described above.

Example

Write the Snakefile

Run the jobs on Azure Batch nodes!

Here I pass the required values via CLI flags as described above, but they can also be detected from their respective environment variables. The example shown below are dummy values:

snakemake -j1 --executor azure-batch \
    --azure-batch-account-url https://accountname.westus2.batch.azure.com \
    --azure-batch-subscription-id d2c845cd-4903-40da-b34c-a6fec7115e21 \
    --azure-batch-resource-group-name rg-batch-test

Example with Azure Storage Backend

snakemake -j1 --executor azure-batch \
    --azure-batch-account-url https://accountname.westus2.batch.azure.com \
    --azure-batch-subscription-id d2c845cd-4903-40da-b34c-a6fec7115e21 \
    --azure-batch-resource-group-name rg-batch-test
    --default-storage-provider azure
    --default-storage-prefix 'az://account/container/path/'