Snakemake storage plugin: s3

https://img.shields.io/badge/repository-github-blue?color=%23022c22 https://img.shields.io/badge/author-Johannes%20Koester-purple?color=%23064e3b PyPI - Version PyPI - License

Warning

No documentation found in repository https://github.com/snakemake/snakemake-storage-plugin-s3. The plugin should provide a docs/intro.md with some introductory sentences and optionally a docs/further.md file with details beyond the auto-generated usage instractions presented in this catalog.

Installation

Install this plugin by installing it with pip or mamba, e.g.:

pip install snakemake-storage-plugin-s3

Usage

Queries

Queries to this storage should have the following format:

Query type

Query

Description

any

s3://mybucket/myfile.txt

A file in an S3 bucket

As default provider

If you want all your input and output (which is not explicitly marked to come from another storage) to be written to and read from this storage, you can use it as a default provider via:

snakemake --default-storage-provider s3 --default-storage-prefix ...

with ... being the prefix of a query under which you want to store all your results. You can also pass custom settings via command line arguments:

snakemake --default-storage-provider s3 --default-storage-prefix ... \
    --storage-s3-max-requests-per-second ... \        --storage-s3-endpoint-url ... \        --storage-s3-access-key ... \        --storage-s3-secret-key ... \        --storage-s3-token ... \        --storage-s3-signature-version ... \        --storage-s3-retries ...

Within the workflow

If you want to use this storage plugin only for specific items, you can register it inside of your workflow:

# register storage provider (not needed if no custom settings are to be defined here)
storage:
    provider="s3",
    # optionally add custom settings here if needed
    # alternatively they can be passed via command line arguments
    # starting with --storage-s3-..., see
    # snakemake --help
    # Maximum number of requests per second for this storage provider. If nothing is specified, the default implemented by the storage plugin is used.
    max_requests_per_second=...,
    # S3 endpoint URL (if omitted, AWS S3 is used)
    endpoint_url=...,
    # S3 access key (if omitted, credentials are taken from .aws/credentials as e.g. created by aws configure)
    access_key=...,
    # S3 secret key (if omitted, credentials are taken from .aws/credentials as e.g. created by aws configure)
    secret_key=...,
    # S3 token (usually not required)
    token=...,
    # S3 signature version
    signature_version=...,
    # S3 API retries
    retries=...,

rule example:
    input:
        storage.s3(
            # define query to the storage backend here
            ...
        ),
    output:
        "example.txt"
    shell:
        "..."

Using multiple entities of the same storage plugin

In case you have to use this storage plugin multiple times, but with different settings (e.g. to connect to different storage servers), you can register it multiple times, each time providing a different tag:

# register shared settings
storage:
    provider="s3",
    # optionally add custom settings here if needed
    # alternatively they can be passed via command line arguments
    # starting with --storage-s3-..., see below
    # Maximum number of requests per second for this storage provider. If nothing is specified, the default implemented by the storage plugin is used.
    max_requests_per_second=...,
    # S3 endpoint URL (if omitted, AWS S3 is used)
    endpoint_url=...,
    # S3 access key (if omitted, credentials are taken from .aws/credentials as e.g. created by aws configure)
    access_key=...,
    # S3 secret key (if omitted, credentials are taken from .aws/credentials as e.g. created by aws configure)
    secret_key=...,
    # S3 token (usually not required)
    token=...,
    # S3 signature version
    signature_version=...,
    # S3 API retries
    retries=...,

# register multiple tagged entities
storage foo:
    provider="s3",
    # optionally add custom settings here if needed
    # alternatively they can be passed via command line arguments
    # starting with --storage-s3-..., see below.
    # To only pass a setting to this tagged entity, prefix the given value with
    # the tag name, i.e. foo:max_requests_per_second=...
    # Maximum number of requests per second for this storage provider. If nothing is specified, the default implemented by the storage plugin is used.
    max_requests_per_second=...,
    # S3 endpoint URL (if omitted, AWS S3 is used)
    endpoint_url=...,
    # S3 access key (if omitted, credentials are taken from .aws/credentials as e.g. created by aws configure)
    access_key=...,
    # S3 secret key (if omitted, credentials are taken from .aws/credentials as e.g. created by aws configure)
    secret_key=...,
    # S3 token (usually not required)
    token=...,
    # S3 signature version
    signature_version=...,
    # S3 API retries
    retries=...,

rule example:
    input:
        storage.foo(
            # define query to the storage backend here
            ...
        ),
    output:
        "example.txt"
    shell:
        "..."

Settings

The storage plugin has the following settings (which can be passed via command line, the workflow or environment variables, if provided in the respective columns):

CLI setting

Workflow setting

Envvar setting

Description

Default

Choices

Required

Type

--storage-s3-max-requests-per-second VALUE

max_requests_per_second

Maximum number of requests per second for this storage provider. If nothing is specified, the default implemented by the storage plugin is used.

None

str

--storage-s3-endpoint-url VALUE

endpoint_url

S3 endpoint URL (if omitted, AWS S3 is used)

None

str

--storage-s3-access-key VALUE

access_key

SNAKEMAKE_STORAGE_S3_ACCESS_KEY

S3 access key (if omitted, credentials are taken from .aws/credentials as e.g. created by aws configure)

None

str

--storage-s3-secret-key VALUE

secret_key

SNAKEMAKE_STORAGE_S3_SECRET_KEY

S3 secret key (if omitted, credentials are taken from .aws/credentials as e.g. created by aws configure)

None

str

--storage-s3-token VALUE

token

SNAKEMAKE_STORAGE_S3_TOKEN

S3 token (usually not required)

None

str

--storage-s3-signature-version VALUE

signature_version

S3 signature version

None

str

--storage-s3-retries VALUE

retries

S3 API retries

5

int