Snakemake storage plugin: xrootd

https://img.shields.io/badge/repository-github-blue?color=%23022c22 GitHub - Last commit https://img.shields.io/badge/author-Chris%20Burr%20%3Cchristopher.burr%40cern.ch%3E-purple?color=%23064e3b https://img.shields.io/badge/author-Johannes%20Koester%20%3Cjohannes.koester%40uni--due.de%3E-purple?color=%23064e3b https://img.shields.io/badge/author-Matthew%20Monk%20%3Cmatthew.david.monk%40cern.ch%3E-purple?color=%23064e3b PyPI - Version PyPI - License Snakemake

A Snakemake storage plugin to read and write via the XRootD protocol.

Currently, only files can be used as inputs or outputs and not directories.

The plugin can be used without specifying any options relating to the URLs, in which case all information must be contained in the URL passed by the user.

The options for host, port, username, password, protocol, and url_decorator can be specified to make the URLs shorter and easier to use.

Please note: if the password option is supplied (even implicitly via the environment variable SNAKEMAKE_STORAGE_XROOTD_PASSWORD) it will be displayed in plaintext as part of the XRootD URLs when Snakemake prints information about a rule. Only use the password option in trusted environments.

The optional protocol setting can be used to set the preferred XRootD authentication protocol order directly in the provider configuration. The value is passed through to XRootD unchanged, so comma-separated values such as krb5,unix are supported.

Installation

Install this plugin by installing it with pip or mamba directly, e.g.:

pip install snakemake-storage-plugin-xrootd

Or, if you are using pixi, add the plugin to your pixi.toml. Be careful to put it under the right dependency type based on the plugin’s availability, e.g.:

snakemake-storage-plugin-xrootd = "*"

Usage

Queries

Queries to this storage should have the following format:

Query type

Query

Description

any

root://eosuser.cern.ch//eos/user/s/someuser/somefile.txt

A file on a XrootD instance not specifying any arguments.

any

root://eos/user/s/someuser/somefile.txt

A file on an XrootD instance where the host has beenspecified in the storage object.

As default provider

If you want all your input and output (which is not explicitly marked to come from another storage) to be written to and read from this storage, you can use it as a default provider via:

snakemake --default-storage-provider xrootd --default-storage-prefix ...

with ... being the prefix of a query under which you want to store all your results. You can also pass custom settings via command line arguments:

snakemake --default-storage-provider xrootd --default-storage-prefix ... \
    --storage-xrootd-max-requests-per-second ... \
    --storage-xrootd-host ... \
    --storage-xrootd-port ... \
    --storage-xrootd-username ... \
    --storage-xrootd-password ... \
    --storage-xrootd-protocol ... \
    --storage-xrootd-url-decorator ...

Within the workflow

If you want to use this storage plugin only for specific items, you can register it inside of your workflow:

# register storage provider (not needed if no custom settings are to be defined here)
storage:
    provider="xrootd",
    # optionally add custom settings here if needed
    # alternatively they can be passed via command line arguments
    # starting with --storage-xrootd-..., see
    # snakemake --help
    # Maximum number of requests per second for this storage provider. If nothing is specified, the default implemented by the storage plugin is used.
    max_requests_per_second=...,
    # The XrootD host to connect to
    host=...,
    # The XrootD port to connect to
    port=...,
    # The username to use for authentication
    username=...,
    # The password to use for authentication. NOTE: Only use this setting in trusted environments! Snakemake will print the password in plaintext as part of the XRootD URLs used in the inputs/outputs of jobs.
    password=...,
    # Preferred XRootD authentication protocol(s), passed through to the client unchanged. Comma-separated values such as 'krb5,unix' are allowed.
    protocol=...,
    # Entry point to a function (e.g. 'module:func') or a Python expression (e.g. 'url + "?foo=bar"') that decorates the URL. Function expects a single string argument (URL) and returns the decorated URL. Expression has 'url' available.
    url_decorator=...,

rule example:
    input:
        storage.xrootd(
            # define query to the storage backend here
            ...
        ),
    output:
        "example.txt"
    shell:
        "..."

Using multiple entities of the same storage plugin

In case you have to use this storage plugin multiple times, but with different settings (e.g. to connect to different storage servers), you can register it multiple times, each time providing a different tag:

# register shared settings
storage:
    provider="xrootd",
    # optionally add custom settings here if needed
    # alternatively they can be passed via command line arguments
    # starting with --storage-xrootd-..., see below
    # Maximum number of requests per second for this storage provider. If nothing is specified, the default implemented by the storage plugin is used.
    max_requests_per_second=...,
    # The XrootD host to connect to
    host=...,
    # The XrootD port to connect to
    port=...,
    # The username to use for authentication
    username=...,
    # The password to use for authentication. NOTE: Only use this setting in trusted environments! Snakemake will print the password in plaintext as part of the XRootD URLs used in the inputs/outputs of jobs.
    password=...,
    # Preferred XRootD authentication protocol(s), passed through to the client unchanged. Comma-separated values such as 'krb5,unix' are allowed.
    protocol=...,
    # Entry point to a function (e.g. 'module:func') or a Python expression (e.g. 'url + "?foo=bar"') that decorates the URL. Function expects a single string argument (URL) and returns the decorated URL. Expression has 'url' available.
    url_decorator=...,

# register multiple tagged entities
storage foo:
    provider="xrootd",
    # optionally add custom settings here if needed
    # alternatively they can be passed via command line arguments
    # starting with --storage-xrootd-..., see below.
    # To only pass a setting to this tagged entity, prefix the given value with
    # the tag name, i.e. foo:max_requests_per_second=...
    # Maximum number of requests per second for this storage provider. If nothing is specified, the default implemented by the storage plugin is used.
    max_requests_per_second=...,
    # The XrootD host to connect to
    host=...,
    # The XrootD port to connect to
    port=...,
    # The username to use for authentication
    username=...,
    # The password to use for authentication. NOTE: Only use this setting in trusted environments! Snakemake will print the password in plaintext as part of the XRootD URLs used in the inputs/outputs of jobs.
    password=...,
    # Preferred XRootD authentication protocol(s), passed through to the client unchanged. Comma-separated values such as 'krb5,unix' are allowed.
    protocol=...,
    # Entry point to a function (e.g. 'module:func') or a Python expression (e.g. 'url + "?foo=bar"') that decorates the URL. Function expects a single string argument (URL) and returns the decorated URL. Expression has 'url' available.
    url_decorator=...,

rule example:
    input:
        storage.foo(
            # define query to the storage backend here
            ...
        ),
    output:
        "example.txt"
    shell:
        "..."

Settings

The storage plugin has the following settings (which can be passed via command line, the workflow or environment variables, if provided in the respective columns):

Settings

CLI argument

Description

Default

Choices

Required

Type

--storage-xrootd-max-requests-per-second VALUE

Maximum number of requests per second for this storage provider. If nothing is specified, the default implemented by the storage plugin is used.

None

--storage-xrootd-host VALUE

The XrootD host to connect to

None

--storage-xrootd-port VALUE

The XrootD port to connect to

None

--storage-xrootd-username VALUE

The username to use for authentication

None

--storage-xrootd-password VALUE

The password to use for authentication. NOTE: Only use this setting in trusted environments! Snakemake will print the password in plaintext as part of the XRootD URLs used in the inputs/outputs of jobs.

None

--storage-xrootd-protocol VALUE

Preferred XRootD authentication protocol(s), passed through to the client unchanged. Comma-separated values such as ‘krb5,unix’ are allowed.

None

--storage-xrootd-url-decorator VALUE

Entry point to a function (e.g. ‘module:func’) or a Python expression (e.g. ‘url + “?foo=bar”’) that decorates the URL. Function expects a single string argument (URL) and returns the decorated URL. Expression has ‘url’ available.

None

Further details

The username and password fields do not need to be specified if you use kerberos and/or a VOMS proxy for authentication.

The optional protocol setting can be used to pass a preferred XRootD authentication protocol list to the client, for example krb5 or krb5,unix. This is useful for setups where relying on external XrdSecPROTOCOL forwarding would be fragile.

The optional url_decorator argument can be used to pass a function to modify the root URL.

A possible use-case would be a function that wraps the URL with a token to allow for authentication.

If both protocol and url_decorator are used, the plugin adds the xrd.wantprot query parameter first and then applies the decorator. Decorators therefore need to handle URLs that may already contain query parameters.