Snakemake executor plugin: azure-batch
This is the Snakemake executor for Azure Batch. This plugin is used to distribute Snakemake tasks to a pool of Azure Batch compute node(s).
Installation
Install this plugin by installing it with pip or mamba, e.g.:
pip install snakemake-executor-plugin-azure-batch
Usage
In order to use the plugin, run Snakemake (>=8.0) with the corresponding value for the executor flag:
snakemake --executor azure-batch ...
with ...
being any additional arguments you want to use.
The executor plugin has the following settings:
CLI argument |
Description |
Default |
Choices |
Required |
Type |
---|---|---|---|---|---|
|
Batch account url: https://<account>.<region>.batch.azure.com |
|
✓ |
||
|
Enable autoscaling of the azure batch pool nodes, this option will set the initial dedicated node count to zero, and requires five minutes to resize the cluster, so is only recommended for longer running workflows. |
|
✗ |
||
|
Azure container registry password. |
|
✗ |
||
|
Azure container registry url. |
|
✗ |
||
|
Azure container registry user. |
|
✗ |
||
|
Keep the Azure Batch resources after the workflow finished. |
|
✗ |
||
|
Azure Managed Identity resource id. Managed identity is used forauthentication of the Azure Batch nodes to other Azure resources. Requiredif using the Snakemake Azure Storage plugin or if you need access to Azure Container registry from the nodes. |
|
✗ |
||
|
Azure Managed Identity client id. |
|
✗ |
||
|
Azure Batch node start task bash script url.This can be any url that hosts your start task bash script. Azure blob SASurls work nicely here |
|
✗ |
||
|
Azure batch node fill type. |
|
✗ |
||
|
Azure Batch node communication mode. |
|
✗ |
||
|
Azure Batch pool subnet id. |
|
✗ |
||
|
Batch pool image publisher. |
|
✗ |
||
|
Batch pool image offer. |
|
✗ |
||
|
Batch pool image sku. |
|
✗ |
||
|
Azure batch pool vm node agent sku id. |
|
✗ |
||
|
Azure batch pool vm size. |
|
✗ |
||
|
Azure batch pool node count. |
|
✗ |
||
|
The name of the Azure Resource Group containing the Azure Batch Account |
|
✓ |
||
|
The Azure Subscription ID of the Azure Batch Account |
|
✓ |
||
|
Batch tasks per node. If node count is greater than 1, this optionhelps optimize the number of tasks each node can handle simultaneously. |
|
✗ |
Further details
Azure Batch Authentication
The plugin uses a CustomAzureCredential chain that prefers the use of AzureCliCredential, then falls back to a ManagedIdentityCredential, and finally, an EnvironmentCredential (service principal) to create and destroy Azure Batch resources. The caller must have Contributor permissions on the Azure Batch account for the plugin to work properly. If you are using the Azure Storage plugin you should also have the Storage Blob Data Contributor role for the storage account(s) you use.
To run a Snakemake workflow using your azure identity you need to ensure you are logged in using the Azure CLI:
az login
If you are running Snakemake from a GitHub workflow, you can authenticate the GitHub runner with a User-Assigned Managed Identity, and grant that Managed Identity Contributor permissions to the Azure Batch Account.
When using the Snakemake storage plugin for azure, or if you have tasks that need access to the Azure Container Registry or other Azure resources, it is required to setup a user assigned managed identity with the executor. The Batch nodes will assume this identity at runtime, and you can grant them permissions to Azure resources using this identity.
The most common role to grant the Managed Identity will be Storage Blob Data Contributor Role for any storage account you want to read/write data from the Azure Batch nodes.
Setup
The following required parameters are used to setup the executor, the can either be passed with their environment variable or command line flag forms.
ENVIRONMENT_VAR |
CLI_FLAG |
REQUIRED |
---|---|---|
SNAKEMAKE_AZURE_BATCH_ACCOUNT_URL |
–azure-batch-account-url |
True |
SNAKEMAKE_AZURE_BATCH_SUBSCRIPTION_ID |
–azure-batch-subscription-id |
True |
SNAKEMAKE_AZURE_BATCH_RESOURCE_GROUP_NAME |
–azure-batch-resource-group-name |
True |
The remaining options are described above.
Example
Write the Snakefile
Run the jobs on Azure Batch nodes!
Here I pass the required values via CLI flags as described above, but they can also be detected from their respective environment variables. The example shown below are dummy values:
snakemake -j1 --executor azure-batch \
--azure-batch-account-url https://accountname.westus2.batch.azure.com \
--azure-batch-subscription-id d2c845cd-4903-40da-b34c-a6fec7115e21 \
--azure-batch-resource-group-name rg-batch-test
Example with Azure Storage Backend
snakemake -j1 --executor azure-batch \
--azure-batch-account-url https://accountname.westus2.batch.azure.com \
--azure-batch-subscription-id d2c845cd-4903-40da-b34c-a6fec7115e21 \
--azure-batch-resource-group-name rg-batch-test
--default-storage-provider azure
--default-storage-prefix 'az://account/container/path/'