franciscozorrilla/metaGEM

:gem: An easy-to-use workflow for generating context specific genome-scale metabolic models and predicting metabolic interactions within microbial communities directly from metagenomic data

Overview

Latest release: v1.0.5, Last update: 2024-09-19

Share link: https://snakemake.github.io/snakemake-workflow-catalog?wf=franciscozorrilla/metaGEM

Quality control: linting: failed formatting: failed

Topics: metagenomics computational-biology metabolic-models gut-microbiome snakemake metagenome-assembled-genomes mags metabolism bioinformatics flux-balance-analysis genome-scale-metabolic-model metabolic-modeling microbial-ecology microbiome systems-biology

Deployment

Step 1: Install Snakemake and Snakedeploy

Snakemake and Snakedeploy are best installed via the Conda package manager. It is recommended to install conda via Miniforge. Run

conda create -c conda-forge -c bioconda -c nodefaults --name snakemake snakemake snakedeploy

to install both Snakemake and Snakedeploy in an isolated environment. For all following commands ensure that this environment is activated via

conda activate snakemake

For other installation methods, refer to the Snakemake and Snakedeploy documentation.

Step 2: Deploy workflow

With Snakemake and Snakedeploy installed, the workflow can be deployed as follows. First, create an appropriate project working directory on your system and enter it:

mkdir -p path/to/project-workdir
cd path/to/project-workdir

In all following steps, we will assume that you are inside of that directory. Then run

snakedeploy deploy-workflow https://github.com/franciscozorrilla/metaGEM . --tag v1.0.5

Snakedeploy will create two folders, workflow and config. The former contains the deployment of the chosen workflow as a Snakemake module, the latter contains configuration files which will be modified in the next step in order to configure the workflow to your needs.

Step 3: Configure workflow

To configure the workflow, adapt config/config.yml to your needs following the instructions below.

Step 4: Run workflow

The deployment method is controlled using the --software-deployment-method (short --sdm) argument.

Snakemake will automatically detect the main Snakefile in the workflow subfolder and execute the workflow module that has been defined by the deployment in step 2.

For further options such as cluster and cloud execution, see the docs.

Step 5: Generate report

After finalizing your data analysis, you can automatically generate an interactive visual HTML report for inspection of results together with parameters and code inside of the browser using

snakemake --report report.zip

Configuration

The following section is imported from the workflow’s config/README.md.

💎 Setup guide

🔩 Config files

Make sure to inspect and set up the two config files in this folder.

Snakemake configuration

config.yaml: handles all the tunable parameters, subfolder names, paths, and more. The root path is automatically set by the metaGEM.sh parser to be the current working directory. Most importantly, you should make sure that the scratch path is properly configured. Most clusters have a location for temporary or high I/O operations such as $TMPDIR or $SCRATCH, e.g. see here. Please refer to the config.yaml wiki page for a more in depth look at this config file.

Cluster configuration

cluster_config.json: handles parameters for submitting jobs to the cluster workload manager. Most importantly, you should make sure that the account is properly defined to be able to submit jobs to your cluster. Please refer to the cluster_config.json wiki page for a more in depth look at this config file.

🛢️ Environments

Set up three conda environments:

  1. mamba: Used for installing mamba and setting up subsequent environments from recipe files
  2. metagem: Contains most metaGEM core workflow tools, Python 3
  3. metawrap Contains metaWRAP and its dependencies, Python 2

1. mamba

Conda can take ages to solve environment dependencies when installing many tools at once, we can use mamba instead for faster installation.

conda create -n mamba mamba

Activate mamba environment to quickly set up subsequent environments.

source activate mamba

2. metaGEM

Clone metaGEM repo

git clone https://github.com/franciscozorrilla/metaGEM.git

Move into metaGEM/workflow folder

cd metaGEM/workflow

Clean up unnecessary ~250 Mb of unnecessary git history files

rm -r ../.git

Press y and Enter when prompted to remove write-protected files, these are not necessary and just eat your precious space.

rm: remove write-protected regular file ‘.git/objects/pack/pack-f4a65f7b63c09419a9b30e64b0e4405c524a5b35.pack’? y
rm: remove write-protected regular file ‘.git/objects/pack/pack-f4a65f7b63c09419a9b30e64b0e4405c524a5b35.idx’? y

Create metaGEM env using recipe .yml file

mamba env create --prefix ./envs/metagem -f envs/metaGEM_env.yml

Deactivate mamba env and activate metaGEM env

source deactivate && source activate envs/metagem

Install pip tools

pip install --user memote carveme smetana

3. metaWRAP

It is best to set up metaWRAP in its own isolated environment to prevent version conflicts with metaGEM. Note that metaWRAP v1.3.2 has not migrated from python 2 to python 3 yet.

conda create -n metawrap
source activate metawrap
conda install -c ursky metawrap-mg=1.3.2

Or using the conda recipe file:

mamba env create --prefix ./envs/metawrap -f envs/metaWRAP_env.yml

🔮 Check installation

To make sure that the basics have been properly configured, run the check task using the metaGEM.sh parser:

bash metaGEM.sh -t check

This will check if conda is installed/available and verify that the environments were properly set up. Additionally, this check function will prompt you to create results folders if they are not already present. Finally, this task will check if any sequencing files are present in the dataset folder, prompting the user to the either organize already existing files into sample-specific subfolders or to download a small toy dataset.

Tools requiring additional configuration

Please note that you will need to set up the following tools/databases to run the complete core metaGEM workflow:

1. CheckM

CheckM is used extensively within the metaWRAP modules to evaluate the output of various intermediate steps. Although the CheckM package is installed in the metawrap environment, the user is required to download the CheckM database and run checkm data setRoot <db_dir> as outlined in the CheckM installation guide.

2. GTDB-Tk

GTDB-Tk is used for taxonomic assignment of MAGs, and requires a database to be downloaded and configured. Please refer to the installation documentation for detailed instructions.

3. CPLEX

Unfortunately CPLEX cannot be automatically installed in the env_setup.sh script, you must install this dependency manually within the metagem conda environment. GEM reconstruction and GEM community simulations require the IBM CPLEX solver, which is free to download with an academic license. Refer to the CarveMe and SMETANA installation instructions for further information or troubleshooting. Note: CPLEX v.12.8 is recommended.

Linting and formatting

Linting results
1/tmp/tmpmbg0b4rt/franciscozorrilla-metaGEM-d26adaf/workflow/Snakefile:2452: SyntaxWarning: invalid escape sequence '\.'
2/tmp/tmpmbg0b4rt/franciscozorrilla-metaGEM-d26adaf/workflow/Snakefile:2488: SyntaxWarning: invalid escape sequence '\.'
3/tmp/tmpmbg0b4rt/franciscozorrilla-metaGEM-d26adaf/workflow/Snakefile:2758: SyntaxWarning: invalid escape sequence '\/'
4/tmp/tmpmbg0b4rt/franciscozorrilla-metaGEM-d26adaf/workflow/Snakefile:2797: SyntaxWarning: invalid escape sequence '\/'
5WorkflowError in file /tmp/tmpmbg0b4rt/franciscozorrilla-metaGEM-d26adaf/workflow/Snakefile, line 1:
6Workflow defines configfile ../config/config.yaml but it is not present or accessible (full checked path: /tmp/tmpmbg0b4rt/config/config.yaml).
Formatting results
 1[DEBUG] 
 2[DEBUG] In file "/tmp/tmpmbg0b4rt/franciscozorrilla-metaGEM-d26adaf/workflow/rules/metabat_single.smk":  Formatted content is different from original
 3[DEBUG] 
 4[DEBUG] In file "/tmp/tmpmbg0b4rt/franciscozorrilla-metaGEM-d26adaf/workflow/rules/kallisto2concoctTable.smk":  Formatted content is different from original
 5[DEBUG] 
 6<unknown>:1: SyntaxWarning: invalid escape sequence '\.'
 7<unknown>:1: SyntaxWarning: invalid escape sequence '\/'
 8[DEBUG] In file "/tmp/tmpmbg0b4rt/franciscozorrilla-metaGEM-d26adaf/workflow/Snakefile":  Formatted content is different from original
 9[DEBUG] 
10[DEBUG] In file "/tmp/tmpmbg0b4rt/franciscozorrilla-metaGEM-d26adaf/workflow/rules/maxbin_single.smk":  Formatted content is different from original
11[INFO] 4 file(s) would be changed 😬
12
13snakefmt version: 0.10.2