Snakemake storage plugin: sharepoint
A Snakemake storage plugin for reading and writing files on Microsoft Sharepoint sites. For now only tested with Sharepoint 2016 on premise, so if any issues arise with your SharePoint site, please file an issue on the GitHub repository.
Overwriting files
Overwriting existing files is disabled by default.
It can be enabled either for the storage provider or individual files.
For individual files you add ?overwrite=...
to your query,
for the storage provider you set allow_overwrite
in the settings
In both cases the setting can be either True, False or None.
The table below shows how the two settings interact to set the overwrite behaviour for
an individual file:
allow_overwrite=False |
allow_overwrite=None |
allow_overwrite=True |
|
---|---|---|---|
?overwrite=false |
False |
False |
False |
?overwrite=none |
False |
False |
True |
?overwrite=true |
False |
True |
True |
No suffix is equal to ?overwrite=none
,
whereas ?overwrite
is equal to ?overwrite=true
.
Installation
Install this plugin by installing it with pip or mamba, e.g.:
pip install snakemake-storage-plugin-sharepoint
Usage
Queries
Queries to this storage should have the following format:
Query type |
Query |
Description |
---|---|---|
input |
|
A file data.csv in a SharePoint library called Documents. |
input |
|
A file file.txt under a folder named folder in a SharePoint library called library. |
output |
|
A file target.csv in a SharePoint library called Documents. Overwrite behavior determined by the allow_overwrite setting. |
output |
|
A file file.txt under a folder named folder in a SharePoint library called library. Overwrite allowedif the allow_overwrite setting is not False. |
As default provider
If you want all your input and output (which is not explicitly marked to come from another storage) to be written to and read from this storage, you can use it as a default provider via:
snakemake --default-storage-provider sharepoint --default-storage-prefix ...
with ...
being the prefix of a query under which you want to store all your
results.
You can also pass custom settings via command line arguments:
snakemake --default-storage-provider sharepoint --default-storage-prefix ... \
--storage-sharepoint-max-requests-per-second ... \ --storage-sharepoint-auth ... \ --storage-sharepoint-allow-redirects ... \ --storage-sharepoint-site-url ... \ --storage-sharepoint-allow-overwrite ... \ --storage-sharepoint-upload-timeout ...
Within the workflow
If you want to use this storage plugin only for specific items, you can register it inside of your workflow:
# register storage provider (not needed if no custom settings are to be defined here)
storage:
provider="sharepoint",
# optionally add custom settings here if needed
# alternatively they can be passed via command line arguments
# starting with --storage-sharepoint-..., see
# snakemake --help
# Maximum number of requests per second for this storage provider. If nothing is specified, the default implemented by the storage plugin is used.
max_requests_per_second=...,
# HTTP(S) authentication. AUTH_TYPE is the class name of requests.auth (e.g. HTTPBasicAuth), ARG1,ARG2,... are the arguments required by the specified type. PACKAGE is the full path to the module from which to import the class (semantically this does 'from PACKAGE import AUTH_TYPE').
auth=...,
# Follow redirects when retrieving files.
allow_redirects=...,
# The URL of the SharePoint site.
site_url=...,
# Allow overwriting files in the SharePoint site.
allow_overwrite=...,
# The timeout in milliseconds for uploading files.
upload_timeout=...,
rule example:
input:
storage.sharepoint(
# define query to the storage backend here
...
),
output:
"example.txt"
shell:
"..."
Using multiple entities of the same storage plugin
In case you have to use this storage plugin multiple times, but with different settings (e.g. to connect to different storage servers), you can register it multiple times, each time providing a different tag:
# register shared settings
storage:
provider="sharepoint",
# optionally add custom settings here if needed
# alternatively they can be passed via command line arguments
# starting with --storage-sharepoint-..., see below
# Maximum number of requests per second for this storage provider. If nothing is specified, the default implemented by the storage plugin is used.
max_requests_per_second=...,
# HTTP(S) authentication. AUTH_TYPE is the class name of requests.auth (e.g. HTTPBasicAuth), ARG1,ARG2,... are the arguments required by the specified type. PACKAGE is the full path to the module from which to import the class (semantically this does 'from PACKAGE import AUTH_TYPE').
auth=...,
# Follow redirects when retrieving files.
allow_redirects=...,
# The URL of the SharePoint site.
site_url=...,
# Allow overwriting files in the SharePoint site.
allow_overwrite=...,
# The timeout in milliseconds for uploading files.
upload_timeout=...,
# register multiple tagged entities
storage foo:
provider="sharepoint",
# optionally add custom settings here if needed
# alternatively they can be passed via command line arguments
# starting with --storage-sharepoint-..., see below.
# To only pass a setting to this tagged entity, prefix the given value with
# the tag name, i.e. foo:max_requests_per_second=...
# Maximum number of requests per second for this storage provider. If nothing is specified, the default implemented by the storage plugin is used.
max_requests_per_second=...,
# HTTP(S) authentication. AUTH_TYPE is the class name of requests.auth (e.g. HTTPBasicAuth), ARG1,ARG2,... are the arguments required by the specified type. PACKAGE is the full path to the module from which to import the class (semantically this does 'from PACKAGE import AUTH_TYPE').
auth=...,
# Follow redirects when retrieving files.
allow_redirects=...,
# The URL of the SharePoint site.
site_url=...,
# Allow overwriting files in the SharePoint site.
allow_overwrite=...,
# The timeout in milliseconds for uploading files.
upload_timeout=...,
rule example:
input:
storage.foo(
# define query to the storage backend here
...
),
output:
"example.txt"
shell:
"..."
Settings
The storage plugin has the following settings (which can be passed via command line, the workflow or environment variables, if provided in the respective columns):
CLI setting |
Workflow setting |
Envvar setting |
Description |
Default |
Choices |
Required |
Type |
---|---|---|---|---|---|---|---|
|
|
Maximum number of requests per second for this storage provider. If nothing is specified, the default implemented by the storage plugin is used. |
|
✗ |
str |
||
|
|
|
HTTP(S) authentication. AUTH_TYPE is the class name of requests.auth (e.g. HTTPBasicAuth), ARG1,ARG2,… are the arguments required by the specified type. PACKAGE is the full path to the module from which to import the class (semantically this does ‘from PACKAGE import AUTH_TYPE’). |
|
✗ |
str |
|
|
|
Follow redirects when retrieving files. |
|
✗ |
str |
||
|
|
|
The URL of the SharePoint site. |
|
✗ |
str |
|
|
|
Allow overwriting files in the SharePoint site. |
|
✗ |
str |
||
|
|
The timeout in milliseconds for uploading files. |
|
✗ |
str |
Further details
For now, the site_url
setting is a required setting on the storage provider.
This is because the URL to a document cannot uniquely be parsed into the separate components
necessary for downloading and uploading on SharePoint (which are: site collection, library,
and filename).
Also, overwriting files on SharePoint is disabled by default, and needs to be enabled on the
storage provider using the allow_overwrite
setting.
Finally, removing files from the remote location is not implemented at all, follow this issue for the current status. Contributions to implement this in a way such that not the entire version history is removed are welcome.