Snakemake storage plugin: git

https://img.shields.io/badge/repository-github-blue?color=%23022c22 https://img.shields.io/badge/author-Hyeokjin%20Kwon%20%3Chyeokjin.kwon%40uni--potsdam.de%3E-purple?color=%23064e3b PyPI - Version PyPI - License

Warning

This plugin is not maintained and reviewed by the official Snakemake organization.

Snakemake storage plugin: git

This plugin allows you to clone remote Git repositories via SSH and HTTPS.

Installation

Install this plugin by installing it with pip or mamba, e.g.:

pip install snakemake-storage-plugin-git

Usage

Queries

Queries to this storage should have the following format:

Query type

Query

Description

input

https://example.com/repo.git

The remote git repository is accessed via HTTPS.

input

ssh://example.com/repo.git

The remote git repository is accessed via SSH.

As default provider

If you want all your input and output (which is not explicitly marked to come from another storage) to be written to and read from this storage, you can use it as a default provider via:

snakemake --default-storage-provider git --default-storage-prefix ...

with ... being the prefix of a query under which you want to store all your results. You can also pass custom settings via command line arguments:

snakemake --default-storage-provider git --default-storage-prefix ... \
    --storage-git-max-requests-per-second ... \
    --storage-git-enable-rate-limits ... \
    --storage-git-local-path-delimiter ... \
    --storage-git-fetch-to-update ... \
    --storage-git-ssh-username ... \
    --storage-git-ssh-pubkey-path ... \
    --storage-git-ssh-privkey-path ... \
    --storage-git-ssh-passphrase ... \
    --storage-git-custom-heads ... \
    --storage-git-keep-local ... \
    --storage-git-ignore-errors ... \
    --storage-git-retrieve ... \
    --storage-git--is-test ...

Within the workflow

If you want to use this storage plugin only for specific items, you can register it inside of your workflow:

# register storage provider (not needed if no custom settings are to be defined here)
storage:
    provider="git",
    # optionally add custom settings here if needed
    # alternatively they can be passed via command line arguments
    # starting with --storage-git-..., see
    # snakemake --help
    # Maximum number of requests per second for this storage provider. 0.01 is recommended for GitHub if many repositories are cloned to avoid exceeding the rate limit.
    max_requests_per_second=...,
    # Use rate limiting for platforms that require it (e.g. GitHub).
    enable_rate_limits=...,
    # Delimiter to replace '/' with in the local path of the cloned repositories.
    local_path_delimiter=...,
    # Fetch changes from the remote if the repository already exists in local.
    fetch_to_update=...,
    # Username for SSH authentication.
    ssh_username=...,
    # Path to the SSH public key for authentication.
    ssh_pubkey_path=...,
    # Path to the SSH private key for authentication.
    ssh_privkey_path=...,
    # Passphrase for the SSH private key.
    ssh_passphrase=...,
    # Do checkout to a custom branche(or tag) and commit after cloning.{"<GIT_URL>": {"tag": "<TAG>", "branch": "<BRANCH>", "commit": "<COMMIT_ID>"}}
    custom_heads=...,
    # Keep the cloned repositories after the workflow is finished.
    keep_local=...,
    # Ignore errors when cloning or pulling repositories. This is useful to keep continuing cloning or pulling repositories even if some of them fail.
    ignore_errors=...,
    # This value should always be Flase, as this storage provider does not support retrieving objects.
    retrieve=...,
    # This is only used for unit tests.
    _is_test=...,

rule example:
    input:
        storage.git(
            # define query to the storage backend here
            ...
        ),
    output:
        "example.txt"
    shell:
        "..."

Using multiple entities of the same storage plugin

In case you have to use this storage plugin multiple times, but with different settings (e.g. to connect to different storage servers), you can register it multiple times, each time providing a different tag:

# register shared settings
storage:
    provider="git",
    # optionally add custom settings here if needed
    # alternatively they can be passed via command line arguments
    # starting with --storage-git-..., see below
    # Maximum number of requests per second for this storage provider. 0.01 is recommended for GitHub if many repositories are cloned to avoid exceeding the rate limit.
    max_requests_per_second=...,
    # Use rate limiting for platforms that require it (e.g. GitHub).
    enable_rate_limits=...,
    # Delimiter to replace '/' with in the local path of the cloned repositories.
    local_path_delimiter=...,
    # Fetch changes from the remote if the repository already exists in local.
    fetch_to_update=...,
    # Username for SSH authentication.
    ssh_username=...,
    # Path to the SSH public key for authentication.
    ssh_pubkey_path=...,
    # Path to the SSH private key for authentication.
    ssh_privkey_path=...,
    # Passphrase for the SSH private key.
    ssh_passphrase=...,
    # Do checkout to a custom branche(or tag) and commit after cloning.{"<GIT_URL>": {"tag": "<TAG>", "branch": "<BRANCH>", "commit": "<COMMIT_ID>"}}
    custom_heads=...,
    # Keep the cloned repositories after the workflow is finished.
    keep_local=...,
    # Ignore errors when cloning or pulling repositories. This is useful to keep continuing cloning or pulling repositories even if some of them fail.
    ignore_errors=...,
    # This value should always be Flase, as this storage provider does not support retrieving objects.
    retrieve=...,
    # This is only used for unit tests.
    _is_test=...,

# register multiple tagged entities
storage foo:
    provider="git",
    # optionally add custom settings here if needed
    # alternatively they can be passed via command line arguments
    # starting with --storage-git-..., see below.
    # To only pass a setting to this tagged entity, prefix the given value with
    # the tag name, i.e. foo:max_requests_per_second=...
    # Maximum number of requests per second for this storage provider. 0.01 is recommended for GitHub if many repositories are cloned to avoid exceeding the rate limit.
    max_requests_per_second=...,
    # Use rate limiting for platforms that require it (e.g. GitHub).
    enable_rate_limits=...,
    # Delimiter to replace '/' with in the local path of the cloned repositories.
    local_path_delimiter=...,
    # Fetch changes from the remote if the repository already exists in local.
    fetch_to_update=...,
    # Username for SSH authentication.
    ssh_username=...,
    # Path to the SSH public key for authentication.
    ssh_pubkey_path=...,
    # Path to the SSH private key for authentication.
    ssh_privkey_path=...,
    # Passphrase for the SSH private key.
    ssh_passphrase=...,
    # Do checkout to a custom branche(or tag) and commit after cloning.{"<GIT_URL>": {"tag": "<TAG>", "branch": "<BRANCH>", "commit": "<COMMIT_ID>"}}
    custom_heads=...,
    # Keep the cloned repositories after the workflow is finished.
    keep_local=...,
    # Ignore errors when cloning or pulling repositories. This is useful to keep continuing cloning or pulling repositories even if some of them fail.
    ignore_errors=...,
    # This value should always be Flase, as this storage provider does not support retrieving objects.
    retrieve=...,
    # This is only used for unit tests.
    _is_test=...,

rule example:
    input:
        storage.foo(
            # define query to the storage backend here
            ...
        ),
    output:
        "example.txt"
    shell:
        "..."

Settings

The storage plugin has the following settings (which can be passed via command line, the workflow or environment variables, if provided in the respective columns):

Settings

CLI argument

Description

Default

Choices

Required

Type

--storage-git-max-requests-per-second VALUE

Maximum number of requests per second for this storage provider. 0.01 is recommended for GitHub if many repositories are cloned to avoid exceeding the rate limit.

1

--storage-git-enable-rate-limits VALUE

Use rate limiting for platforms that require it (e.g. GitHub).

True

--storage-git-local-path-delimiter VALUE

Delimiter to replace ‘/’ with in the local path of the cloned repositories.

'+'

--storage-git-fetch-to-update VALUE

Fetch changes from the remote if the repository already exists in local.

True

--storage-git-ssh-username VALUE

Username for SSH authentication.

'git'

--storage-git-ssh-pubkey-path VALUE

Path to the SSH public key for authentication.

'/dev/null'

--storage-git-ssh-privkey-path VALUE

Path to the SSH private key for authentication.

'/dev/null'

--storage-git-ssh-passphrase VALUE

Passphrase for the SSH private key.

''

--storage-git-custom-heads VALUE

Do checkout to a custom branche(or tag) and commit after cloning.{“<GIT_URL>”: {“tag”: “<TAG>”, “branch”: “<BRANCH>”, “commit”: “<COMMIT_ID>”}}

None

--storage-git-keep-local VALUE

Keep the cloned repositories after the workflow is finished.

True

--storage-git-ignore-errors VALUE

Ignore errors when cloning or pulling repositories. This is useful to keep continuing cloning or pulling repositories even if some of them fail.

False

--storage-git-retrieve VALUE

This value should always be Flase, as this storage provider does not support retrieving objects.

False

--storage-git--is-test VALUE

This is only used for unit tests.

False