Snakemake storage plugin: lfs

https://img.shields.io/badge/repository-github-blue?color=%23022c22 GitHub - Last commit https://img.shields.io/badge/author-PyPSA--Eur%20Authors%20%3Cjonas.hoersch%40openenergytransition.org%3E-purple?color=%23064e3b PyPI - Version PyPI - License Snakemake

Warning

This plugin is not maintained and reviewed by the official Snakemake organization.

Warning

No documentation found in repository https://github.com/pypsa/pypsa-eur. The plugin should provide a docs/intro.md with some introductory sentences and optionally a docs/further.md file with details beyond the auto-generated usage instructions presented in this catalog.

Installation

Install this plugin by installing it with pip or mamba directly, e.g.:

pip install snakemake-storage-plugin-lfs

Or, if you are using pixi, add the plugin to your pixi.toml. Be careful to put it under the right dependency type based on the plugin’s availability, e.g.:

snakemake-storage-plugin-lfs = "*"

Usage

Queries

Queries to this storage should have the following format:

Query type

Query

Description

input

lfs://abc123def456/path/to/file.csv

A Git LFS object by OID and path

As default provider

If you want all your input and output (which is not explicitly marked to come from another storage) to be written to and read from this storage, you can use it as a default provider via:

snakemake --default-storage-provider lfs --default-storage-prefix ...

with ... being the prefix of a query under which you want to store all your results. You can also pass custom settings via command line arguments:

snakemake --default-storage-provider lfs --default-storage-prefix ... \
    --storage-lfs-repo-url ... \
    --storage-lfs-token-envvar ... \
    --storage-lfs-local-repo ... \
    --storage-lfs-cache ... \
    --storage-lfs-skip-remote-checks ... \
    --storage-lfs-max-concurrent-downloads ...

Within the workflow

If you want to use this storage plugin only for specific items, you can register it inside of your workflow:

# register storage provider (not needed if no custom settings are to be defined here)
storage:
    provider="lfs",
    # optionally add custom settings here if needed
    # alternatively they can be passed via command line arguments
    # starting with --storage-lfs-..., see
    # snakemake --help
    # Git repository URL used to construct the LFS batch API endpoint (e.g. https://github.com/org/repo).
    repo_url=...,
    # Name of the environment variable containing the authentication token for the LFS server (used as Basic Auth password).
    token_envvar=...,
    # Path to a local git repository to look up LFS objects before downloading. If the OID is found locally but the hash does not match, a warning is issued.
    local_repo=...,
    # Cache directory for downloaded files. Set to a path to enable caching (default: "" = disabled).
    cache=...,
    # Whether to skip metadata checking with remote LFS server (default: False).
    skip_remote_checks=...,
    # Maximum number of concurrent downloads.
    max_concurrent_downloads=...,

rule example:
    input:
        storage.lfs(
            # define query to the storage backend here
            ...
        ),
    output:
        "example.txt"
    shell:
        "..."

Using multiple entities of the same storage plugin

In case you have to use this storage plugin multiple times, but with different settings (e.g. to connect to different storage servers), you can register it multiple times, each time providing a different tag:

# register shared settings
storage:
    provider="lfs",
    # optionally add custom settings here if needed
    # alternatively they can be passed via command line arguments
    # starting with --storage-lfs-..., see below
    # Git repository URL used to construct the LFS batch API endpoint (e.g. https://github.com/org/repo).
    repo_url=...,
    # Name of the environment variable containing the authentication token for the LFS server (used as Basic Auth password).
    token_envvar=...,
    # Path to a local git repository to look up LFS objects before downloading. If the OID is found locally but the hash does not match, a warning is issued.
    local_repo=...,
    # Cache directory for downloaded files. Set to a path to enable caching (default: "" = disabled).
    cache=...,
    # Whether to skip metadata checking with remote LFS server (default: False).
    skip_remote_checks=...,
    # Maximum number of concurrent downloads.
    max_concurrent_downloads=...,

# register multiple tagged entities
storage foo:
    provider="lfs",
    # optionally add custom settings here if needed
    # alternatively they can be passed via command line arguments
    # starting with --storage-lfs-..., see below.
    # To only pass a setting to this tagged entity, prefix the given value with
    # the tag name, i.e. foo:repo_url=...
    # Git repository URL used to construct the LFS batch API endpoint (e.g. https://github.com/org/repo).
    repo_url=...,
    # Name of the environment variable containing the authentication token for the LFS server (used as Basic Auth password).
    token_envvar=...,
    # Path to a local git repository to look up LFS objects before downloading. If the OID is found locally but the hash does not match, a warning is issued.
    local_repo=...,
    # Cache directory for downloaded files. Set to a path to enable caching (default: "" = disabled).
    cache=...,
    # Whether to skip metadata checking with remote LFS server (default: False).
    skip_remote_checks=...,
    # Maximum number of concurrent downloads.
    max_concurrent_downloads=...,

rule example:
    input:
        storage.foo(
            # define query to the storage backend here
            ...
        ),
    output:
        "example.txt"
    shell:
        "..."

Settings

The storage plugin has the following settings (which can be passed via command line, the workflow or environment variables, if provided in the respective columns):

Settings

CLI argument

Description

Default

Choices

Required

Type

--storage-lfs-repo-url VALUE

Git repository URL used to construct the LFS batch API endpoint (e.g. https://github.com/org/repo).

''

--storage-lfs-token-envvar VALUE

Name of the environment variable containing the authentication token for the LFS server (used as Basic Auth password).

''

--storage-lfs-local-repo VALUE

Path to a local git repository to look up LFS objects before downloading. If the OID is found locally but the hash does not match, a warning is issued.

''

--storage-lfs-cache VALUE

Cache directory for downloaded files. Set to a path to enable caching (default: “” = disabled).

''

--storage-lfs-skip-remote-checks VALUE

Whether to skip metadata checking with remote LFS server (default: False).

False

--storage-lfs-max-concurrent-downloads VALUE

Maximum number of concurrent downloads.

3