Snakemake executor plugin: sge
Warning
This plugin is not maintained and reviewed by the official Snakemake organization.
Snakemake Executor Plugin for SGE/UGE/OGS
A Snakemake executor plugin for submitting jobs to Sun Grid Engine (SGE), Univa Grid Engine (UGE), and Open Grid Scheduler (OGS) clusters.
Installation
Install the plugin using conda or pip:
# Using conda
conda install -c bioconda snakemake-executor-plugin-sge
# Using pip
pip install snakemake-executor-plugin-sge
Basic Usage
To use the SGE executor, specify it when running Snakemake:
snakemake --executor sge --jobs 100
This will submit each Snakemake job as a separate SGE job (qsub), allowing up to 100 concurrent jobs.
Key Features
Direct cluster submission: Jobs are submitted to SGE/UGE/OGS via
qsubArray jobs: Group jobs are automatically submitted as SGE array jobs (
qsub -t 1-N) to reduce scheduler overheadResource specification: Define per-rule resource requirements (memory, runtime, threads, queue)
Cross-job dependencies: Snakemake DAG dependencies are translated to SGE dependencies
Automatic log management: Job logs are collected and optionally cleaned up after workflow completion
Configuration
Command-line Flags
Common flags to configure the executor:
# Specify SGE queue
snakemake --executor sge --sge-queue high.q --jobs 100
# Set default memory per job
snakemake --executor sge --default-resources mem_mb=8000 --jobs 100
# Disable array jobs for group jobs
snakemake --executor sge --sge-disable-group-jobs-as-array --jobs 100
Per-Rule Resources
Define resource requirements in your Snakemake rules:
rule my_analysis:
input: "input.txt"
output: "output.txt"
resources:
mem_mb=16000, # Memory in MB
runtime=120, # Runtime in minutes
threads=8, # Number of threads
sge_queue="high.q", # SGE queue (optional)
sge_project="myproject", # Project code (optional)
shell:
"process_data.sh {input} {output}"
Environment Variables
The executor respects the standard Snakemake environment variables. Cluster-specific variables are passed to jobs automatically.
Log Files
By default, job logs are written to .snakemake/sge_logs/ in your working directory:
Single jobs:
{JOBID}.log(stdout) and{JOBID}.error(stderr)Array jobs:
{JOBID}.{TASKID}.logand{JOBID}.{TASKID}.error
Helper files (task manifests and scripts) are stored in .snakemake/sge_logs/.meta/.
You can customize the log directory:
snakemake --executor sge --sge-logdir custom_logs --jobs 100
Successful job logs are automatically deleted at workflow completion unless you set:
snakemake --executor sge --sge-keep-successful-logs --jobs 100
Next Steps
See further.md for advanced topics including:
Job array optimization and limits
Cross-job dependency resolution
Status polling and timeouts
Troubleshooting and debugging
Install this plugin by installing it with pip or mamba directly, e.g.:
pip install snakemake-executor-plugin-sge
Or, if you are using pixi, add the plugin to your pixi.toml. Be careful to put it under the right dependency type based on the plugin’s availability, e.g.:
snakemake-executor-plugin-sge = "*"
In order to use the plugin, run Snakemake (>=8.6) in the folder where your workflow code and config resides (containing either workflow/Snakefile or Snakefile) with the corresponding value for the executor flag:
snakemake --executor sge --default-resources --jobs N ...
with N being the number of jobs you want to run in parallel and ... being any additional arguments you want to use (see below).
The machine on which you run Snakemake must have the executor plugin installed, and, depending on the type of the executor plugin, have access to the target service of the executor plugin (e.g. an HPC middleware like slurm with the sbatch command, or internet access to submit jobs to some cloud provider, e.g. azure).
The flag --default-resources ensures that Snakemake auto-calculates the mem and disk resources for each job, based on the input file size.
The values assumed there are conservative and should usually suffice.
However, you can always override those defaults by specifying the resources in your Snakemake rules or via the --set-resources flag.
Depending on the executor plugin, you might either rely on a shared local filesystem or use a remote filesystem or storage. For the latter, you have to additionally use a suitable storage plugin (see section storage plugins in the sidebar of this catalog) and eventually check for further recommendations in the sections below.
All arguments can also be persisted via a profile, such that they don’t have to be specified on each invocation. Here, this would mean the following entries inside of the profile
executor: sge
default_resources: []
For specifying other default resources than the built-in ones, see the docs.
The executor plugin has the following settings (which can be passed via command line, the workflow or environment variables, if provided in the respective columns):
Advanced Topics
Job Arrays
How They Work
Snakemake group jobs (created with group: directive) are automatically submitted as SGE array jobs when using this executor. Array jobs significantly reduce scheduler overhead compared to submitting individual qsub commands.
For example, if your workflow has a group with 100 tasks:
Without array jobs: 100 individual
qsubcallsWith array jobs: 1
qsub -t 1-100call
Array Limits
The executor respects SGE’s maximum array size through the array_limit setting:
snakemake --executor sge --sge-array-limit 75000 --jobs 100
Default: 75,000 tasks per array. If a group exceeds this limit, multiple array submissions are automatically performed.
Task Encoding
Array tasks are encoded as zlib-compressed, base64-encoded commands in a shared task map file. This approach:
Avoids ARG_MAX shell argument limits
Allows any task size (large commands are handled gracefully)
Stores the map in
.sge_logs/.meta/{group|rule}/task_map.b64Includes a human-readable manifest in
.sge_logs/.meta/{group|rule}/task_manifest.json
Cross-Job Dependencies
Dependency Resolution
The executor automatically translates Snakemake’s DAG dependencies to SGE dependencies:
Within a rule: If all tasks have matching upstreams in a single array job, the executor uses
qsub -hold_jid_adfor per-task 1:1 dependenciesMultiple rules: Falls back to
qsub -hold_jidto wait for entire upstream job(s)Immediate submit: The executor maintains an in-memory mapping of Snakemake jobs to SGE job IDs for
--immediate-submitmode
Manual Dependencies
You can also manually hold jobs on upstream SGE job IDs:
snakemake --executor sge --sge-hold-jid 12345 --jobs 100
This is useful when coordinating with external SGE jobs.
Status Polling
Query Strategy
The executor polls job status using a combination of qstat and qacct:
qstat: Fast, reports running and queued jobs
qacct: Slower, reports completed and failed jobs
Combined: The executor queries both to accurately track job states
Initial delay before first poll:
snakemake --executor sge --sge-init-seconds-before-status-checks 20 --jobs 100
Default: 20 seconds (SGE schedulers are typically fast; adjust if needed).
Disabling qacct
If your cluster has a slow or unavailable qacct:
snakemake --executor sge --sge-disable-qacct --jobs 100
Note: Without qacct, the executor may not detect completed jobs as quickly.
Retry Logic
Status check attempts before reporting a job as stuck:
snakemake --executor sge --sge-status-attempts 5 --jobs 100
Default: 5 attempts. Increase if your cluster has temporary qstat/qacct outages.
Log File Management
Directory Structure
All logs are stored in .snakemake/sge_logs/ by default:
.snakemake/
├── log/
├── locks/
├── metadata/
└── sge_logs/
├── 12345.log # Single job stdout
├── 12345.error # Single job stderr
├── 12346.1.log # Array job task 1 stdout
├── 12346.1.error # Array job task 1 stderr
├── 12346.2.log # Array job task 2 stdout
├── 12346.2.error # Array job task 2 stderr
└── .meta/
├── rule_align/
│ ├── task_map.b64 # Encoded task commands
│ └── task_manifest.json
└── group_process/
├── task_map.b64
└── task_manifest.json
Log Cleanup
Logs for successful jobs are automatically deleted at workflow completion. To keep them:
snakemake --executor sge --sge-keep-successful-logs --jobs 100
Automatic Cleanup of Old Logs
Old logs are cleaned up automatically based on:
snakemake --executor sge --sge-delete-logfiles-older-than 10 --jobs 100
Default: 10 days. Set to 0 or negative to disable.
Queue and Project Assignment
Static Configuration
Specify default queue and project:
snakemake --executor sge --sge-queue high.q --sge-project myproject --jobs 100
Per-Rule Override
Override in individual rules:
rule expensive:
input: "data.txt"
output: "result.txt"
resources:
sge_queue="high.q",
sge_project="urgent",
shell:
"expensive_computation.sh {input} {output}"
Parallel Environments
For multi-threaded jobs, specify a parallel environment:
snakemake --executor sge --sge-pe "smp 4" --jobs 100
Or per-rule:
rule parallel_task:
threads: 8
resources:
sge_pe="smp", # Will be paired with thread count automatically
shell:
"parallel_tool {threads} {input} {output}"
Job Naming
Add a prefix to all SGE job names for easier tracking:
snakemake --executor sge --sge-jobname-prefix "analysis_" --jobs 100
This will submit jobs with names like analysis_uuid_xxxx instead of just uuid_xxxx.
Troubleshooting
Check Job Status
List all submitted jobs:
qstat
Check a specific job:
qstat -j 12345
View finished job accounting:
qacct -j 12345
View Logs
Check stdout and stderr:
cat .sge_logs/12345.log
cat .sge_logs/12345.error
For array jobs:
cat .sge_logs/12345.1.log # Task 1
cat .sge_logs/12345.2.error # Task 2 stderr
Common Issues
“qstat: command not found”
SGE client tools are not in your PATH
Load the SGE environment module or add SGE binaries to PATH
Jobs not starting
Check queue availability:
qconf -sqlVerify resource requests (memory, runtime) don’t exceed limits
Check project membership:
qconf -sprjandqconf -sprjl
Slow status polling
If
qacctis very slow, disable it:--sge-disable-qacctIncrease initial delay:
--sge-init-seconds-before-status-checks 30
Array job failures
Check the task manifest:
.sge_logs/.meta/rule_name/task_manifest.jsonView array job script:
.sge_logs/.meta/rule_name/array_job_*.shCheck individual task logs for error details
Performance Tips
Use array jobs: Always prefer Snakemake group jobs for similar tasks
Batch submissions: Use
--jobsto control submission rate (default is unlimited)Tune status polling: Adjust
init_seconds_before_status_checksbased on your cluster’s speedMonitor logs: Disable log cleanup initially to diagnose issues:
--sge-keep-successful-logsResource requests: Be realistic with memory and runtime to avoid queue delays