young5454/ABComp

ABComp : Assembly Polishing and Bacterial Whole-genome Comparison Pipeline for Multi-group Clinical Isolates

Overview

Topics:

Latest release: None, Last update: 2024-12-07

Linting: linting: failed, Formatting: formatting: failed

Deployment

Step 1: Install Snakemake and Snakedeploy

Snakemake and Snakedeploy are best installed via the Mamba package manager (a drop-in replacement for conda). If you have neither Conda nor Mamba, it is recommended to install Miniforge. More details regarding Mamba can be found here.

When using Mamba, run

mamba create -c conda-forge -c bioconda --name snakemake snakemake snakedeploy

to install both Snakemake and Snakedeploy in an isolated environment. For all following commands ensure that this environment is activated via

conda activate snakemake

Step 2: Deploy workflow

With Snakemake and Snakedeploy installed, the workflow can be deployed as follows. First, create an appropriate project working directory on your system and enter it:

mkdir -p path/to/project-workdir
cd path/to/project-workdir

In all following steps, we will assume that you are inside of that directory. Then run

snakedeploy deploy-workflow https://github.com/young5454/ABComp . --tag None

Snakedeploy will create two folders, workflow and config. The former contains the deployment of the chosen workflow as a Snakemake module, the latter contains configuration files which will be modified in the next step in order to configure the workflow to your needs.

Step 3: Configure workflow

To configure the workflow, adapt config/config.yml to your needs following the instructions below.

Step 4: Run workflow

The deployment method is controlled using the --software-deployment-method (short --sdm) argument.

To run the workflow with automatic deployment of all required software via conda/mamba, use

snakemake --cores all --sdm conda

Snakemake will automatically detect the main Snakefile in the workflow subfolder and execute the workflow module that has been defined by the deployment in step 2.

For further options such as cluster and cloud execution, see the docs.

Step 5: Generate report

After finalizing your data analysis, you can automatically generate an interactive visual HTML report for inspection of results together with parameters and code inside of the browser using

snakemake --report report.zip

Configuration

The following section is imported from the workflow’s config/README.md.

Configuration file

ABComp requires two configuration files for running the pipeline. These yaml files can be found in the config/ directory.

config.yml is a default configuration setting for the overall Snakemake run. Make sure you specify the correct parameters and directory names of your preference.

groups_original.yml is a configuration file for the complete group-strain information of your clinical isolates. Below is an example yaml file of a 2-group, 5-strain setting :

NONMDR:
    - B0112
    - C0234
    - C3455
MDR:
    - B0232
    - D0991

Linting and formatting

Linting results

  1Lints for snakefile /tmp/tmpf06e0zdf/workflow/Snakefile:
  2    * Path composition with '+' in line 82:
  3      This becomes quickly unreadable. Usually, it is better to endure some
  4      redundancy against having a more readable workflow. Hence, just repeat
  5      common prefixes. If path composition is unavoidable, use pathlib or
  6      (python >= 3.6) string formatting with f"...".
  7
  8Lints for rule polypolish (line 366, /tmp/tmpf06e0zdf/workflow/Snakefile):
  9    * Param path is a prefix of input or output file but hardcoded:
 10      If this is meant to represent a file path prefix, it will fail when
 11      running workflow in environments without a shared filesystem. Instead,
 12      provide a function that infers the appropriate prefix from the input or
 13      output file, e.g.: lambda w, input: os.path.splitext(input[0])[0]
 14      Also see:
 15      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules
 16      https://snakemake.readthedocs.io/en/stable/tutorial/advanced.html#tutorial-input-functions
 17
 18Lints for rule busco (line 412, /tmp/tmpf06e0zdf/workflow/Snakefile):
 19    * No log directive defined:
 20      Without a log directive, all output will be printed to the terminal. In
 21      distributed environments, this means that errors are harder to discover.
 22      In local environments, output of concurrent jobs will be mixed and become
 23      unreadable.
 24      Also see:
 25      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
 26    * Param out_path is a prefix of input or output file but hardcoded:
 27      If this is meant to represent a file path prefix, it will fail when
 28      running workflow in environments without a shared filesystem. Instead,
 29      provide a function that infers the appropriate prefix from the input or
 30      output file, e.g.: lambda w, input: os.path.splitext(input[0])[0]
 31      Also see:
 32      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules
 33      https://snakemake.readthedocs.io/en/stable/tutorial/advanced.html#tutorial-input-functions
 34
 35Lints for rule quast (line 448, /tmp/tmpf06e0zdf/workflow/Snakefile):
 36    * No log directive defined:
 37      Without a log directive, all output will be printed to the terminal. In
 38      distributed environments, this means that errors are harder to discover.
 39      In local environments, output of concurrent jobs will be mixed and become
 40      unreadable.
 41      Also see:
 42      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
 43
 44Lints for rule prokka_ref (line 470, /tmp/tmpf06e0zdf/workflow/Snakefile):
 45    * No log directive defined:
 46      Without a log directive, all output will be printed to the terminal. In
 47      distributed environments, this means that errors are harder to discover.
 48      In local environments, output of concurrent jobs will be mixed and become
 49      unreadable.
 50      Also see:
 51      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
 52    * Param out_dir is a prefix of input or output file but hardcoded:
 53      If this is meant to represent a file path prefix, it will fail when
 54      running workflow in environments without a shared filesystem. Instead,
 55      provide a function that infers the appropriate prefix from the input or
 56      output file, e.g.: lambda w, input: os.path.splitext(input[0])[0]
 57      Also see:
 58      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules
 59      https://snakemake.readthedocs.io/en/stable/tutorial/advanced.html#tutorial-input-functions
 60
 61Lints for rule prokka_strain (line 513, /tmp/tmpf06e0zdf/workflow/Snakefile):
 62    * No log directive defined:
 63      Without a log directive, all output will be printed to the terminal. In
 64      distributed environments, this means that errors are harder to discover.
 65      In local environments, output of concurrent jobs will be mixed and become
 66      unreadable.
 67      Also see:
 68      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
 69    * Param out_dir is a prefix of input or output file but hardcoded:
 70      If this is meant to represent a file path prefix, it will fail when
 71      running workflow in environments without a shared filesystem. Instead,
 72      provide a function that infers the appropriate prefix from the input or
 73      output file, e.g.: lambda w, input: os.path.splitext(input[0])[0]
 74      Also see:
 75      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules
 76      https://snakemake.readthedocs.io/en/stable/tutorial/advanced.html#tutorial-input-functions
 77
 78Lints for rule roary_strain_ref_pairwise (line 559, /tmp/tmpf06e0zdf/workflow/Snakefile):
 79    * No log directive defined:
 80      Without a log directive, all output will be printed to the terminal. In
 81      distributed environments, this means that errors are harder to discover.
 82      In local environments, output of concurrent jobs will be mixed and become
 83      unreadable.
 84      Also see:
 85      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
 86    * Param out_dir is a prefix of input or output file but hardcoded:
 87      If this is meant to represent a file path prefix, it will fail when
 88      running workflow in environments without a shared filesystem. Instead,
 89      provide a function that infers the appropriate prefix from the input or
 90      output file, e.g.: lambda w, input: os.path.splitext(input[0])[0]
 91      Also see:
 92      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules
 93      https://snakemake.readthedocs.io/en/stable/tutorial/advanced.html#tutorial-input-functions
 94
 95Lints for rule move_gff_files (line 625, /tmp/tmpf06e0zdf/workflow/Snakefile):
 96    * No log directive defined:
 97      Without a log directive, all output will be printed to the terminal. In
 98      distributed environments, this means that errors are harder to discover.
 99      In local environments, output of concurrent jobs will be mixed and become
100      unreadable.
101      Also see:
102      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
103    * Param workspace is a prefix of input or output file but hardcoded:
104      If this is meant to represent a file path prefix, it will fail when
105      running workflow in environments without a shared filesystem. Instead,
106      provide a function that infers the appropriate prefix from the input or
107      output file, e.g.: lambda w, input: os.path.splitext(input[0])[0]
108      Also see:
109      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules
110      https://snakemake.readthedocs.io/en/stable/tutorial/advanced.html#tutorial-input-functions
111    * Param tmp_dir is a prefix of input or output file but hardcoded:
112      If this is meant to represent a file path prefix, it will fail when
113      running workflow in environments without a shared filesystem. Instead,
114      provide a function that infers the appropriate prefix from the input or
115      output file, e.g.: lambda w, input: os.path.splitext(input[0])[0]
116      Also see:
117      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules
118      https://snakemake.readthedocs.io/en/stable/tutorial/advanced.html#tutorial-input-functions
119
120Lints for rule roary_within_group (line 657, /tmp/tmpf06e0zdf/workflow/Snakefile):
121    * No log directive defined:
122      Without a log directive, all output will be printed to the terminal. In
123      distributed environments, this means that errors are harder to discover.
124      In local environments, output of concurrent jobs will be mixed and become
125      unreadable.
126      Also see:
127      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
128
129Lints for rule gene_list_maker (line 687, /tmp/tmpf06e0zdf/workflow/Snakefile):
130    * No log directive defined:
131      Without a log directive, all output will be printed to the terminal. In
132      distributed environments, this means that errors are harder to discover.
133      In local environments, output of concurrent jobs will be mixed and become
134      unreadable.
135      Also see:
136      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
137
138Lints for rule move_faa_files (line 709, /tmp/tmpf06e0zdf/workflow/Snakefile):
139    * No log directive defined:
140      Without a log directive, all output will be printed to the terminal. In
141      distributed environments, this means that errors are harder to discover.
142      In local environments, output of concurrent jobs will be mixed and become
143      unreadable.
144      Also see:
145      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
146    * Param workspace is a prefix of input or output file but hardcoded:
147      If this is meant to represent a file path prefix, it will fail when
148      running workflow in environments without a shared filesystem. Instead,
149      provide a function that infers the appropriate prefix from the input or
150      output file, e.g.: lambda w, input: os.path.splitext(input[0])[0]
151      Also see:
152      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules
153      https://snakemake.readthedocs.io/en/stable/tutorial/advanced.html#tutorial-input-functions
154    * Param group_dir is a prefix of input or output file but hardcoded:
155      If this is meant to represent a file path prefix, it will fail when
156      running workflow in environments without a shared filesystem. Instead,
157      provide a function that infers the appropriate prefix from the input or
158      output file, e.g.: lambda w, input: os.path.splitext(input[0])[0]
159      Also see:
160      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules
161      https://snakemake.readthedocs.io/en/stable/tutorial/advanced.html#tutorial-input-functions
162
163Lints for rule fasta_curation (line 741, /tmp/tmpf06e0zdf/workflow/Snakefile):
164    * No log directive defined:
165      Without a log directive, all output will be printed to the terminal. In
166      distributed environments, this means that errors are harder to discover.
167      In local environments, output of concurrent jobs will be mixed and become
168      unreadable.
169      Also see:
170      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
171
172Lints for rule cog_analysis_core (line 767, /tmp/tmpf06e0zdf/workflow/Snakefile):
173    * No log directive defined:
174      Without a log directive, all output will be printed to the terminal. In
175      distributed environments, this means that errors are harder to discover.
176      In local environments, output of concurrent jobs will be mixed and become
177      unreadable.
178      Also see:
179      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
180
181Lints for rule cog_analysis_shells (line 803, /tmp/tmpf06e0zdf/workflow/Snakefile):
182    * No log directive defined:
183      Without a log directive, all output will be printed to the terminal. In
184      distributed environments, this means that errors are harder to discover.
185      In local environments, output of concurrent jobs will be mixed and become
186      unreadable.
187      Also see:
188      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
189
190Lints for rule cog_visualization (line 839, /tmp/tmpf06e0zdf/workflow/Snakefile):
191    * No log directive defined:
192      Without a log directive, all output will be printed to the terminal. In
193      distributed environments, this means that errors are harder to discover.
194      In local environments, output of concurrent jobs will be mixed and become
195      unreadable.
196      Also see:
197      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
198
199Lints for rule roary_visualization (line 884, /tmp/tmpf06e0zdf/workflow/Snakefile):
200    * No log directive defined:
201
202... (truncated)

Formatting results

1[DEBUG] 
2[DEBUG] In file "/tmp/tmpf06e0zdf/workflow/Snakefile":  Formatted content is different from original
3[INFO] 1 file(s) would be changed 😬
4
5snakefmt version: 0.10.2