Snakemake scheduler plugin: firstfit

https://img.shields.io/badge/repository-github-blue?color=%23022c22 https://img.shields.io/badge/author-%22Filipe%20G.%20Vieira%22%20%3C1151762%2Bfgvieira%40users.noreply.github.com%3E-purple?color=%23064e3b https://img.shields.io/badge/author-Lucas%20Czech%20%3Cluc.czech%40gmail.com%3E-purple?color=%23064e3b PyPI - Version PyPI - License

Warning

This plugin is not maintained and reviewed by the official Snakemake organization.

This plugin provides a fast snakemake scheduler that, while sacrificing some resources usage efficiency, can be up to 100x faster than the default schedulers.

Installation

Install this plugin by installing it with pip or mamba directly, e.g.:

pip install snakemake-scheduler-plugin-firstfit

Or, if you are using pixi, add the plugin to your pixi.toml. Be careful to put it under the right dependency type based on the plugin’s availability, e.g.:

snakemake-scheduler-plugin-firstfit = "*"

Usage

In order to use the plugin, run Snakemake (>=9.0) with the corresponding value for the scheduler flag:

snakemake --scheduler firstfit ...

with ... being any additional arguments you want to use.

Settings

The scheduler plugin has the following settings (which can be passed via command line, the workflow or environment variables, if provided in the respective columns):

Settings

CLI argument

Description

Default

Choices

Required

Type

--scheduler-firstfit-greediness VALUE

Set the greediness (i.e. size) of the heap queue. This will enable the heap-queue pre-evaluation step, where available jobs are sorted based on their rewards. This value (between 0 and 1) determines how many jobs will be evaluated for execution. A greediness of 1 will only evaluate –max-jobs-per-timespan jobs, while a value of 0 will evaluate all available jobs.

0

--scheduler-firstfit-omit-prioritize-by-temp-and-input VALUE

If set, the size of temporary or input files is not taken into account when prioritizing. By default, it is assumed that temp files should be removed as soon as possible, and larger input files may take longer to process, so it is better to start them earlier.

False

Further details

Why this Plugin

Even though snakemake‘s default schedulers are fast enough for most workflows, they can be considerably slow for very large workflows (i.e. > 300k jobs). This is because, every time a job finishes, snakemake needs to re-evaluate all pending jobs to select the subset that maximizes usage of available resources. This can be specially problematic if the workflow has a lot of relatively fast jobs, since the time lost waiting for the scheduler could have been used to process jobs instead. snakemake is aware of this and, if the default ilp scheduler takes more than 10s, it automatically switches to the greedy scheduler. However, it is known that the ilp sometimes ignores the timeout (coin-or/Cbc#487) and that it can be quite slow instantiating large problems (coin-or/pulp#749).

firstfit aims to considerably speed-up the scheduling process by simplifying the optimization steps (while sacrificing some resource usage efficiency). On a very simple example workflow with ~600k jobs, snakemake‘s greedy scheduler takes around 90s for each scheduling round (i.e. between a job finishing and the launching of the next batch of jobs). firstfit, on the other hand, takes between ~5s (greediness of 0) and 1s (greediness of 1).

How this Plugin works

In this plugin, jobs are selected for run in a first-fit with one bin way. Briefly, available jobs are sorted by their reward (so that higher-reward jobs are evaluated first), and sequentially submited as long as there are available resources. How long the scheduler keeps trying to fit more jobs depends on the --scheduler-plugin-firstfit-greediness parameter, that can go from 0 (all jobs are evaluated) to 1 (only --max-jobs-per-timespan jobs are evaluated). Worth noting that a high greediness value can lead to a sub-optimal resource usage, since less rewarding jobs that could be run are potentially left out.

Contributions

We welcome bug reports, feature requests, and pull requests! Please report issues specific to this plugin in the plugin’s GitHub repository.

Configuration

Snakemake offers great capabilities to specify and thereby limit resources used by a workflow as a whole and by individual jobs.