Snakemake storage plugin: pelican
Warning
No repository URL found in Pypi metadata. The plugin should specify a repository URL in its pyproject.toml (key 'repository'). It is unclear whether the plugin is maintained and reviewed by the official Snakemake organization (https://github.com/snakemake).
Installation
Install this plugin by installing it with pip or mamba directly, e.g.:
pip install snakemake-storage-plugin-pelican
Or, if you are using pixi, add the plugin to your pixi.toml. Be careful to put it under the right dependency type based on the plugin’s availability, e.g.:
snakemake-storage-plugin-pelican = "*"
Usage
Queries
Queries to this storage should have the following format:
Query type |
Query |
Description |
|---|---|---|
any |
|
An example Pelican URL that points to an object in the osg-htc.org (OSDF) federation. |
any |
|
The canonical test object in the osg-htc.org (OSDF) federation. |
As default provider
If you want all your input and output (which is not explicitly marked to come from another storage) to be written to and read from this storage, you can use it as a default provider via:
snakemake --default-storage-provider pelican --default-storage-prefix ...
with ... being the prefix of a query under which you want to store all your
results.
You can also pass custom settings via command line arguments:
snakemake --default-storage-provider pelican --default-storage-prefix ... \
--storage-pelican-max-requests-per-second ... \
--storage-pelican-token-file ... \
--storage-pelican-debug ...
Within the workflow
If you want to use this storage plugin only for specific items, you can register it inside of your workflow:
# register storage provider (not needed if no custom settings are to be defined here)
storage:
provider="pelican",
# optionally add custom settings here if needed
# alternatively they can be passed via command line arguments
# starting with --storage-pelican-..., see
# snakemake --help
# Maximum number of requests per second for this storage provider. If nothing is specified, the default implemented by the storage plugin is used.
max_requests_per_second=...,
# Path to a file containing a Pelican authorization token. Can specify multiple space-separated token mappings (in quotes), each tagged with a Pelican URL prefix. Tags should be pelican:// URLs (can include path prefix). The longest matching URL prefix wins. Examples: (1) Single token for all: --storage-pelican-token-file /path/to/token.txt | (2) Multiple tokens (space-separated in quotes): --storage-pelican-token-file 'pelican://osg-htc.org:/path/to/osg.txt pelican://itb-osdf-director.osdf-dev.chtc.io:/path/to/itb.txt' | (3) Per-namespace: --storage-pelican-token-file 'pelican://osg-htc.org/chtc:/path/to/chtc.txt pelican://osg-htc.org/ospool:/path/to/ospool.txt' | (4) With default: --storage-pelican-token-file 'pelican://osg-htc.org/chtc/itb:/path/to/itb.txt default:/path/to/default.txt'
token_file=...,
# Enable debug logging for the Pelican Storage Plugin. Use: --storage-pelican-debug true
debug=...,
rule example:
input:
storage.pelican(
# define query to the storage backend here
...
),
output:
"example.txt"
shell:
"..."
Using multiple entities of the same storage plugin
In case you have to use this storage plugin multiple times, but with different settings (e.g. to connect to different storage servers), you can register it multiple times, each time providing a different tag:
# register shared settings
storage:
provider="pelican",
# optionally add custom settings here if needed
# alternatively they can be passed via command line arguments
# starting with --storage-pelican-..., see below
# Maximum number of requests per second for this storage provider. If nothing is specified, the default implemented by the storage plugin is used.
max_requests_per_second=...,
# Path to a file containing a Pelican authorization token. Can specify multiple space-separated token mappings (in quotes), each tagged with a Pelican URL prefix. Tags should be pelican:// URLs (can include path prefix). The longest matching URL prefix wins. Examples: (1) Single token for all: --storage-pelican-token-file /path/to/token.txt | (2) Multiple tokens (space-separated in quotes): --storage-pelican-token-file 'pelican://osg-htc.org:/path/to/osg.txt pelican://itb-osdf-director.osdf-dev.chtc.io:/path/to/itb.txt' | (3) Per-namespace: --storage-pelican-token-file 'pelican://osg-htc.org/chtc:/path/to/chtc.txt pelican://osg-htc.org/ospool:/path/to/ospool.txt' | (4) With default: --storage-pelican-token-file 'pelican://osg-htc.org/chtc/itb:/path/to/itb.txt default:/path/to/default.txt'
token_file=...,
# Enable debug logging for the Pelican Storage Plugin. Use: --storage-pelican-debug true
debug=...,
# register multiple tagged entities
storage foo:
provider="pelican",
# optionally add custom settings here if needed
# alternatively they can be passed via command line arguments
# starting with --storage-pelican-..., see below.
# To only pass a setting to this tagged entity, prefix the given value with
# the tag name, i.e. foo:max_requests_per_second=...
# Maximum number of requests per second for this storage provider. If nothing is specified, the default implemented by the storage plugin is used.
max_requests_per_second=...,
# Path to a file containing a Pelican authorization token. Can specify multiple space-separated token mappings (in quotes), each tagged with a Pelican URL prefix. Tags should be pelican:// URLs (can include path prefix). The longest matching URL prefix wins. Examples: (1) Single token for all: --storage-pelican-token-file /path/to/token.txt | (2) Multiple tokens (space-separated in quotes): --storage-pelican-token-file 'pelican://osg-htc.org:/path/to/osg.txt pelican://itb-osdf-director.osdf-dev.chtc.io:/path/to/itb.txt' | (3) Per-namespace: --storage-pelican-token-file 'pelican://osg-htc.org/chtc:/path/to/chtc.txt pelican://osg-htc.org/ospool:/path/to/ospool.txt' | (4) With default: --storage-pelican-token-file 'pelican://osg-htc.org/chtc/itb:/path/to/itb.txt default:/path/to/default.txt'
token_file=...,
# Enable debug logging for the Pelican Storage Plugin. Use: --storage-pelican-debug true
debug=...,
rule example:
input:
storage.foo(
# define query to the storage backend here
...
),
output:
"example.txt"
shell:
"..."
Settings
The storage plugin has the following settings (which can be passed via command line, the workflow or environment variables, if provided in the respective columns):