Snakemake storage plugin: fs

https://img.shields.io/badge/repository-github-blue?color=%23022c22 https://img.shields.io/badge/author-Johannes%20Koester-purple?color=%23064e3b PyPI - Version PyPI - License

A Snakemake storage plugin that reads and writes from a locally mounted filesystem using rsync. This is particularly useful when running Snakemake on an NFS as complex parallel IO patterns can slow down NFS quite substantially. See “Further information” for an example configuration in such a scenario.

Installation

Install this plugin by installing it with pip or mamba, e.g.:

pip install snakemake-storage-plugin-fs

Usage

Queries

Queries to this storage should have the following format:

Query type

Query

Description

any

test/test.txt

Some file or directory path.

As default provider

If you want all your input and output (which is not explicitly marked to come from another storage) to be written to and read from this storage, you can use it as a default provider via:

snakemake --default-storage-provider fs --default-storage-prefix ...

with ... being the prefix of a query under which you want to store all your results.

Within the workflow

If you want to use this storage plugin only for specific items, you can register it inside of your workflow:

# register storage provider (not needed if no custom settings are to be defined here)
storage:
    provider="fs",

rule example:
    input:
        storage.fs(
            # define query to the storage backend here
            ...
        ),
    output:
        "example.txt"
    shell:
        "..."

Further details

The following Snakemake CLI flags allow to avoid harmful IO patterns on shared network filesystems by instructing Snakemake to copy any input to a fast local scatch disk and copying output files back to the network filesystem at the end of a job.

snakemake --default-storage-provider fs --shared-fs-usage persistence software-deployment sources source-cache --local-storage-prefix /local/work/$USER

with /local/work/$USER being the path to the local (non-shared) scratch dir. Alternatively, these options can be persisted in a profile:

default-storage-provider: fs
local-storage-prefix: /local/work/$USER
shared-fs-usage:
  - persistence
  - software-deployment
  - sources
  - source-cache

If the shared scratch is e.g. specific for each job (e.g. controlled by a $JOBID), one can define a job-specific local storage prefix like this

default-storage-provider: fs
local-storage-prefix: /local/work/$USER
remote-job-local-storage-prefix: /local/work/$USER/$JOBID
shared-fs-usage:
  - persistence
  - software-deployment
  - sources
  - source-cache

Note that the non-remote job local storage prefix is still always needed, because Snakemake can also decide to run certain jobs without submission to the cluster or cloud. This can happen either on dev request because a certain rule is very lightweight, or by Snakemake’s own decision, e.g. in case of rules that just format a template (see docs).