Snakemake storage plugin: rucio

https://img.shields.io/badge/repository-github-blue?color=%23022c22 https://img.shields.io/badge/author-Bouwe%20Andela-purple?color=%23064e3b PyPI - Version PyPI - License

A Snakemake storage plugin that reads and writes using Rucio.

Installation

Install this plugin by installing it with pip or mamba, e.g.:

pip install snakemake-storage-plugin-rucio

Usage

Queries

Queries to this storage should have the following format:

Query type

Query

Description

any

rucio://myscope/myfile.txt

A file in a Rucio scope.

As default provider

If you want all your input and output (which is not explicitly marked to come from another storage) to be written to and read from this storage, you can use it as a default provider via:

snakemake --default-storage-provider rucio --default-storage-prefix ...

with ... being the prefix of a query under which you want to store all your results. You can also pass custom settings via command line arguments:

snakemake --default-storage-provider rucio --default-storage-prefix ... \
    --storage-rucio-max-requests-per-second ... \        --storage-rucio-rucio-host ... \        --storage-rucio-auth-host ... \        --storage-rucio-account ... \        --storage-rucio-ca-cert ... \        --storage-rucio-auth-type ... \        --storage-rucio-creds ... \        --storage-rucio-timeout ... \        --storage-rucio-user-agent ... \        --storage-rucio-vo ... \        --storage-rucio-ignore-checksum ... \        --storage-rucio-download-rse ... \        --storage-rucio-upload-rse ... \        --storage-rucio-cache-scope ...

Within the workflow

If you want to use this storage plugin only for specific items, you can register it inside of your workflow:

# register storage provider (not needed if no custom settings are to be defined here)
storage:
    provider="rucio",
    # optionally add custom settings here if needed
    # alternatively they can be passed via command line arguments
    # starting with --storage-rucio-..., see
    # snakemake --help
    # Maximum number of requests per second for this storage provider. If nothing is specified, the default implemented by the storage plugin is used.
    max_requests_per_second=...,
    #  The address of the rucio server, if None it is read from the config file.
    rucio_host=...,
    #  The address of the rucio authentication server, if None it is read from the config file.
    auth_host=...,
    #  The account to authenticate to rucio.
    account=...,
    #  The path to the rucio server certificate.
    ca_cert=...,
    #  The type of authentication (e.g.: 'userpass', 'kerberos' ...)
    auth_type=...,
    #  Dictionary with credentials needed for authentication.
    creds=...,
    #
    timeout=...,
    #  Indicates the client.
    user_agent=...,
    #  The VO to authenticate into.
    vo=...,
    # If true, skips the checksum validation between the downloaded file and the rucio catalogue.
    ignore_checksum=...,
    # Rucio Storage Element (RSE) expression to download files from.
    download_rse=...,
    # Rucio Storage Element (RSE) expression to upload files to.
    upload_rse=...,
    # If true, minimize the number of server calls by caching the size and creation time of all files in the same scope.
    cache_scope=...,

rule example:
    input:
        storage.rucio(
            # define query to the storage backend here
            ...
        ),
    output:
        "example.txt"
    shell:
        "..."

Using multiple entities of the same storage plugin

In case you have to use this storage plugin multiple times, but with different settings (e.g. to connect to different storage servers), you can register it multiple times, each time providing a different tag:

# register shared settings
storage:
    provider="rucio",
    # optionally add custom settings here if needed
    # alternatively they can be passed via command line arguments
    # starting with --storage-rucio-..., see below
    # Maximum number of requests per second for this storage provider. If nothing is specified, the default implemented by the storage plugin is used.
    max_requests_per_second=...,
    #  The address of the rucio server, if None it is read from the config file.
    rucio_host=...,
    #  The address of the rucio authentication server, if None it is read from the config file.
    auth_host=...,
    #  The account to authenticate to rucio.
    account=...,
    #  The path to the rucio server certificate.
    ca_cert=...,
    #  The type of authentication (e.g.: 'userpass', 'kerberos' ...)
    auth_type=...,
    #  Dictionary with credentials needed for authentication.
    creds=...,
    #
    timeout=...,
    #  Indicates the client.
    user_agent=...,
    #  The VO to authenticate into.
    vo=...,
    # If true, skips the checksum validation between the downloaded file and the rucio catalogue.
    ignore_checksum=...,
    # Rucio Storage Element (RSE) expression to download files from.
    download_rse=...,
    # Rucio Storage Element (RSE) expression to upload files to.
    upload_rse=...,
    # If true, minimize the number of server calls by caching the size and creation time of all files in the same scope.
    cache_scope=...,

# register multiple tagged entities
storage foo:
    provider="rucio",
    # optionally add custom settings here if needed
    # alternatively they can be passed via command line arguments
    # starting with --storage-rucio-..., see below.
    # To only pass a setting to this tagged entity, prefix the given value with
    # the tag name, i.e. foo:max_requests_per_second=...
    # Maximum number of requests per second for this storage provider. If nothing is specified, the default implemented by the storage plugin is used.
    max_requests_per_second=...,
    #  The address of the rucio server, if None it is read from the config file.
    rucio_host=...,
    #  The address of the rucio authentication server, if None it is read from the config file.
    auth_host=...,
    #  The account to authenticate to rucio.
    account=...,
    #  The path to the rucio server certificate.
    ca_cert=...,
    #  The type of authentication (e.g.: 'userpass', 'kerberos' ...)
    auth_type=...,
    #  Dictionary with credentials needed for authentication.
    creds=...,
    #
    timeout=...,
    #  Indicates the client.
    user_agent=...,
    #  The VO to authenticate into.
    vo=...,
    # If true, skips the checksum validation between the downloaded file and the rucio catalogue.
    ignore_checksum=...,
    # Rucio Storage Element (RSE) expression to download files from.
    download_rse=...,
    # Rucio Storage Element (RSE) expression to upload files to.
    upload_rse=...,
    # If true, minimize the number of server calls by caching the size and creation time of all files in the same scope.
    cache_scope=...,

rule example:
    input:
        storage.foo(
            # define query to the storage backend here
            ...
        ),
    output:
        "example.txt"
    shell:
        "..."

Settings

The storage plugin has the following settings (which can be passed via command line, the workflow or environment variables, if provided in the respective columns):

CLI setting

Workflow setting

Envvar setting

Description

Default

Choices

Required

Type

--storage-rucio-max-requests-per-second VALUE

max_requests_per_second

Maximum number of requests per second for this storage provider. If nothing is specified, the default implemented by the storage plugin is used.

None

str

--storage-rucio-rucio-host VALUE

rucio_host

The address of the rucio server, if None it is read from the config file.

None

str

--storage-rucio-auth-host VALUE

auth_host

The address of the rucio authentication server, if None it is read from the config file.

None

str

--storage-rucio-account VALUE

account

The account to authenticate to rucio.

None

str

--storage-rucio-ca-cert VALUE

ca_cert

The path to the rucio server certificate.

None

str

--storage-rucio-auth-type VALUE

auth_type

The type of authentication (e.g.: ‘userpass’, ‘kerberos’ …)

None

str

--storage-rucio-creds VALUE

creds

Dictionary with credentials needed for authentication.

None

str

--storage-rucio-timeout VALUE

timeout

600

str

--storage-rucio-user-agent VALUE

user_agent

Indicates the client.

'rucio-clients'

str

--storage-rucio-vo VALUE

vo

The VO to authenticate into.

None

str

--storage-rucio-ignore-checksum VALUE

ignore_checksum

If true, skips the checksum validation between the downloaded file and the rucio catalogue.

False

str

--storage-rucio-download-rse VALUE

download_rse

Rucio Storage Element (RSE) expression to download files from.

None

str

--storage-rucio-upload-rse VALUE

upload_rse

Rucio Storage Element (RSE) expression to upload files to.

None

str

--storage-rucio-cache-scope VALUE

cache_scope

If true, minimize the number of server calls by caching the size and creation time of all files in the same scope.

False

str