GuttmanLab/chipdip-pipeline
Pipeline to process sequencing reads from a ChIP-DIP experiment
Overview
Latest release: v3.0.0, Last update: 2025-10-31
Linting: linting: failed, Formatting: formatting: failed
Topics: chip-seq snakemake
Deployment
Step 1: Install Snakemake and Snakedeploy
Snakemake and Snakedeploy are best installed via the Conda. It is recommended to install conda via Miniforge. Run
conda create -c conda-forge -c bioconda -c nodefaults --name snakemake snakemake snakedeploy
to install both Snakemake and Snakedeploy in an isolated environment. For all following commands ensure that this environment is activated via
conda activate snakemake
For other installation methods, refer to the Snakemake and Snakedeploy documentation.
Step 2: Deploy workflow
With Snakemake and Snakedeploy installed, the workflow can be deployed as follows. First, create an appropriate project working directory on your system and enter it:
mkdir -p path/to/project-workdir
cd path/to/project-workdir
In all following steps, we will assume that you are inside of that directory. Then run
snakedeploy deploy-workflow https://github.com/GuttmanLab/chipdip-pipeline . --tag v3.0.0
Snakedeploy will create two folders, workflow and config. The former contains the deployment of the chosen workflow as a Snakemake module, the latter contains configuration files which will be modified in the next step in order to configure the workflow to your needs.
Step 3: Configure workflow
To configure the workflow, adapt config/config.yml to your needs following the instructions below.
Step 4: Run workflow
The deployment method is controlled using the --software-deployment-method (short --sdm) argument.
To run the workflow with automatic deployment of all required software via conda/mamba, use
snakemake --cores all --sdm conda
Snakemake will automatically detect the main Snakefile in the workflow subfolder and execute the workflow module that has been defined by the deployment in step 2.
For further options such as cluster and cloud execution, see the docs.
Step 5: Generate report
After finalizing your data analysis, you can automatically generate an interactive visual HTML report for inspection of results together with parameters and code inside of the browser using
snakemake --report report.zip
Configuration
The following section is imported from the workflow’s config/README.md.
The pipeline is configured relative to the following directories:
working directory: in decreasing order of precedence, the path specified by the
--directorycommand-line parameter passed to Snakemake, the path specified by theworkdir:directive in the Snakefile, or the directory in which Snakemake was invoked.workflow directory: directory containing the Snakefile
input directory: directory containing the
config/andresources/foldersoutput-directory: directory containing the pipeline output
For a complete description of the directory structures, and for relevant workflow profile configuration settings, see the main repository README.
Configuration Files
These files are located under <input_directory>/config/.
config.yaml: Pipeline configuration - YAML file containing the processing settings and paths of required input files. Paths are specified relative to the working directory.Required? Yes. Must be provided in the working directory, or specified via
--configfile <path_to_config.yaml>when invoking Snakemake.Required keys
scripts_dir: path to scripts folder in the workflow directorysamples: path tosamples.jsonfilebarcode_config: path to barcode config file (e.g.,config.txt)bowtie2_index: path to Bowtie 2 genome indexcutadapt_dpm: path to DPM sequencescutadapt_oligos: path to Antibody ID sequencesbead_umi_length: integer length of bead oligo UMIs
Optional keys: If these keys are omitted from
config.yamlor set tonull, then they will take on the default values indicated.output_dir(default ="results"): path to create the output directory within which all intermediate and output files are placed.temp_dir(default =$TMPDIR(if set) or"/tmp"): path to a temporary directory with sufficient free disk space, such as used by the-Toption of GNU sortbarcode_format(default =null): path to barcode format file (e.g.,format.txt). Ifnull, no barcode validation is performed.conda_env(default ="envs/chipdip.yaml"): either a path to a conda environment YAML file (“*.yml” or “*.yaml”) or the name of an existing conda environment. If the path to a conda environment YAML file, Snakemake will create a new conda environment within the.snakemakefolder of the working directory. If a relative path is used, the path is interpreted as relative to the Snakefile.mask(default =null): path to BED file of genomic regions to ignore, such as ENCODE blacklist regions; reads mapping to these regions are discarded. Ifnull, no masking is performed.path_chrom_map(default =null): path to chromosome name map file. Ifnull, chromosome renaming and filtering are skipped, and the final BAM and/or bigWig files will use all chromosome names as-is from the Bowtie 2 index.deduplication_method(default ="RT&start&end"): specify keys to use for chromatin reads deduplication, in addition to the cluster barcode. Alignment positions (‘start’ and/or ‘end’) and/or the DPM tag (‘RT’) can be combined using ‘&’ (AND) or ‘|’ (OR) operators, with ‘&’ operators taking precedence.num_chunks(default =2): integer between 1 and 99 giving the number of chunks to split FASTQ files from each sample into for parallel processinggenerate_splitbams(default =false): boolean value indicating whether to generate separate BAM files for each antibody targetmin_oligos(default =2): integer giving the minimum count of deduplicated antibody oligo reads in a cluster for that cluster to be assigned to the corresponding antibody target; this criteria is intersected (AND) with theproportionandmax_sizecriteriaproportion(default =0.8): float giving the minimum proportion of deduplicated antibody oligo reads in a cluster for that cluster to be assigned to the corresponding antibody target; this criteria is intersected (AND) with themin_oligosandmax_sizecriteriamax_size(default =10000): integer giving the maximum count of deduplicated genomic DNA reads in a cluster for that cluster to be to be assigned to the corresponding antibody target; this criteria is intersected (AND) with theproportionandmax_sizecriteriamerge_samples(default =false): boolean indicating whether to merge cluster files and target-specific BAM and bigWig files across samplesbinsize(default =false): integer specifying bigWig binsize; set tofalseto skip bigWig generation. Only relevant if generate_splitbams istrue.bigwig_normalization(default ="None"): normalization strategy for calculating coverage from reads; passed to the--normalizeUsingargument for thebamCoveragecommand from the deepTools suite. As of version 3.5.2, deepToolsbamCoveragecurrently supportsRPKM,CPM,BPM,RPGC, orNone. Only relevant if bigWig generation is requested (i.e.,generate_splitbamsistrueandbinsizeis notfalse).effective_genome_size(default =null): integer specifying effective genome size (see deepTools documentation for a definition). Ifnull, effective genome size is computed as the number of unmasked sequences in the Bowtie 2 index, after selecting for chromosomes specified in the chromosome name map file and excluding regions specified by the mask file. Only relevant if bigWig generation is requested using normalization strategyRPGC(i.e.,generate_splitbamsistrue,binsizeis notfalse, andbigwig_normalizationisRPGC).email(default =null): email to send error notifications to if errors are encountered during the pipeline. Ifnull, no emails are sent.
Additional notes
nullvalues can be specified explicitly (e.g.,format: null) or implicitly (e.g.,format:).For keys
format,mask,path_chrom_map, andemail, an empty string""is treated identically to if the value isnull.
samples.json: Samples file - JSON file with the paths of FASTQ files (read1, read2) to process.Required? Yes.
config.yamlkey to specify the path to this file:samplesThis can be prepared using
fastq2json.py --fastq_dir <path_to_directory_of_FASTQs>or manually formatted as follows:{ "sample1": { "R1": [ "<path_to_data>/sample1_run1_R1.fastq.gz", "<path_to_data>/sample1_run2_R1.fastq.gz", ], "R2": [ "<path_to_data>/sample1_run1_R2.fastq.gz", "<path_to_data>/sample1_run2_R2.fastq.gz", ] }, "sample2": { "R1": [ "<path_to_data>/sample2_R1.fastq.gz" ], "R2": [ "<path_to_data>/sample2_R2.fastq.gz" ] }, ... }Data assumptions:
FASTQ files are gzip-compressed.
Read names do not contain two consecutive colons (
::). This is required because the pipeline adds::to the end of read names before adding barcode information; the string::is used as a delimiter in the pipeline to separate the original read name from the identified barcode.
If there are multiple FASTQ files per read orientation per sample (as shown for
sample1in the example above), the pipeline will concatenate them and process them together as the same sample.Each sample is processed independently, generating independent BAM files and statistics for quality assessment (barcode identification efficiency, cluster statistics, cluster size distributions, splitbam statistics). For ease of comparison, all samples are overlaid together in quality assessment plots.
The provided sample read files under the
data/folder were simulated via a Google Colab notebook. The genomic DNA reads correspond to ChIP-seq peaks on chromosome 19 (mm10) for transcription factors MYC (simulated as corresponding to Antibody IDBEAD_AB1-A1) and TCF12 (simulated as corresponding to Antibody IDBEAD_AB2-A2).Sample names (the keys of the samples JSON file) cannot contain any periods (
.). This is enforced to simplify wildcard pattern matching in the Snakefile and to allow the use of periods to delimit tags in a barcode string.
config.txt: Barcode config file - text file containing the sequences of split-pool tags and the expected split-pool barcode structure.Required? Yes.
config.yamlkey to specify the path to this file:barcode_configUsed by:
scripts/java/BarcodeIdentification_v1.2.0.jar(Snakefilerule barcode_id),scripts/python/fastq_to_bam.py(Snakefilerule fastq_to_bam), andscripts/python/barcode_identification_efficiency.py(Snakefilerule barcode_identification_efficiency).This file is also parsed in the Snakefile itself to determine the length of the barcode (i.e., the number of rounds of barcoding) and if
generate_splitbamsis set totrueinconfig.yaml, the set of antibody targets for which to generate individual de-multiplexed BAM files (and bigWig file too, if requested).
Format: SPRITE configuration file (see our SPRITE GitHub Wiki or Nature Protocols paper for details).
Blank lines and lines starting with
#are ignored.An example barcoding configuration file is annotated below:
# Barcoding layout for read 1 and read 2 # - Y represents a terminal tag # - ODD, EVEN, and DPM indicate their respective tags # - SPACER accounts for the 7-nt sticky ends that allow ligation between tags READ1 = DPM READ2 = Y|SPACER|ODD|SPACER|EVEN|SPACER|ODD|SPACER|EVEN|SPACER|ODD # DPM tag sequences formatted as tab-delimited lines # 1. Tag category: DPM # 2. Tag name: must contain "DPM", such as "DPM<xxx>"; must NOT contain "BEAD" # - Can only contain alphanumeric characters, underscores, and hyphens, # i.e., must match the regular expression "[a-zA-Z0-9_\-]+" # 3. Tag sequence (see resources/dpm96.fasta) # 4. Tag error tolerance: acceptable Hamming distance between # expected tag sequence (column 3) and tag sequence in the read DPM DPMBot6_1-A1 TGGGTGTT 0 DPM DPMBot6_2-A2 TGACATGT 0 ... # Antibody ID sequences formatted as tab-delimited lines # - Identical format as for DPM tag sequences, except that Tag name (column 2) # must start with "BEAD_". # - Tag sequences must match resources/bpm.fasta DPM BEAD_AB1-A1 GGAACAGTT 0 DPM BEAD_AB2-A2 CGCCGAATT 0 ... # Split-pool tag sequences: same 4-column tab-delimited format as the # DPM and Antibody ID section above, except that # Tag category (column 1) is now ODD, EVEN, or Y. # Tag name must NOT contain "BEAD" or "DPM". EVEN EvenBot_1-A1 ATACTGCGGCTGACG 2 EVEN EvenBot_2-A2 GTAGGTTCTGGAATC 2 ... ODD OddBot_1-A1 TTCGTGGAATCTAGC 2 ODD OddBot_2-A2 CCTGTGCGTTAGAGT 2 ... Y NYStgBot_1-A1 TATTATGGT 0 Y NYStgBot_2-A2 TAGCTACCTT 0 ...
Notes regarding the entries in
example_config.txtNames: Each name ends with
#-Well(for example,4-A4) where the#gives the row-major index of the tag in a 96-well plate, andWelldenotes the corresponding row and column.Because only the first 2 antibody IDs are included in the example dataset, the other antibody ID rows are commented out. This prevents generation of empty (0 byte) placeholder files for the other 94 antibody IDs.
Sequences
The design of a DPM tags allows for 9 bp of unique sequence, but only 8 bp are used in the published SPRITE tag set (in bottom tags, the 9th bp is currently a constant
'T').example_config.txttherefore only includes the unique 8 bp sequences.The design of EVEN and ODD tags allows for 17 bp of unique sequence, but only 16 bp are used in the published SPRITE tag set (in bottom tags, the 17th bp is currently a constant
'T').example_config.txtfurther trims the 1st unique bp from the bottom tag, leaving only the middle 15 bp unique bottom sequence.The design of Y (terminal) tags allows for 9-12 bp of unique sequence.
format.txt: Barcode format file - tab-delimited text file indicating which split-pool barcode tags are valid in which round of split-pool barcoding (i.e., at which positions in the barcoding string).Required? No, but highly recommended.
config.yamlkey to specify the path to this file:barcode_formatUsed by:
scripts/python/split_bpm_dpm.py(Snakefilerule split_bpm_dpm)Column 1 indicates the zero-indexed position of the barcode string where a tag can be found.
Term barcode tags (Y) are position
0; the second to last round of barcoding tags are position1; etc. A value of-1in the position column indicates that the barcode tag was not used in the experiment.
Column 2 indicates the name of the tag. This must be the same as the name of the tag in the barcode config file. If the same tag is used in multiple barcoding rounds, then it should appear multiple times in column 2, but with different values in column 1 indicating which rounds it is used in.
chrom_map.txt: Chromosome names map - tab-delimited text file specifying which chromosomes from the Bowtie 2 index to keep and how to rename them (if at all).Required? No, but necessary if using a blacklist mask that uses different chromosome names than used in the Bowtie 2 index.
config.yamlkey to specify the path to this file:path_chrom_mapUsed by:
scripts/python/rename_and_filter_chr.py(Snakefilerule rename_and_filter_chr,rule merge_mask, andrule effective_genome_size)Column 1 specifies chromosomes (following naming convention used in the index) to keep.
The order of chromosomes provided here is maintained in the SAM/BAM file header, and consequently specifies the coordinate sorting order at the reference sequence level.
Column 2 specifies new chromosome names for the corresponding chromosomes in column 1.
The provided
chrom-map.txtin this repository contains examples for retaining only canonical human or mouse chromosomes (i.e., excluding alternate loci, unlocalized sequences, and unplaced sequences) and renaming them to UCSC chromosome names (i.e.,chr1,chr2, …,chrX,chrY,chrM) as needed. The header of provided file also includes more detailed documentation about the specific format requirements, such as allowed characters.
Resource Files
These files are located under <input_directory>/resources/.
bpm.fasta: FASTA file containing the sequences of Antibody IDsRequired? Yes.
config.yamlkey to specify the path to this file:cutadapt_oligosUsed by:
cutadapt(Snakefilerule cutadapt_oligo)Each sequence should be preceded by
^to anchor the sequence during cutadapt trimming (see Snakefilerule cutadapt_oligo).
dpm96.fasta: FASTA file containing the sequences of DPM tagsRequired? Yes.
config.yamlkey to specify the path to this file:cutadapt_dpmUsed by:
cutadapt(Snakefilerule cutadapt_dpm)Each of these sequences are 10 nt long, consisting of a unique 9 nt DPM_Bottom sequences as originally designed for SPRITE (technically, only the first 8 nt are unique, and the 9th sequence is always a
T), plus aTthat is complementary to a 3’Aadded to a genomic DNA sequence via dA-tailing.
blacklist_hg38.bed,blacklist_mm10.bed: blacklisted genomic regions for ChIP-seq dataRequired? No, but highly recommended.
config.yamlkey to specify the path to this file:maskUsed by: Snakefile
rule merge_mask, whose output is used byrule repeat_maskandrule effective_genome_sizeFor human genome release hg38, we use ENCFF356LFX from ENCODE. For mouse genome release mm10, we use mm10-blacklist.v2.bed.gz. These BED files use UCSC chromosome names (e.g.,
chr1,chr2, …). The pipeline performs chromosome name remapping (if specified) before this step.Reference paper: Amemiya HM, Kundaje A, Boyle AP. The ENCODE Blacklist: Identification of Problematic Regions of the Genome. Sci Rep. 2019;9(1):9354. doi:10.1038/s41598-019-45839-z
Example code used to download them into the
resources/directory:wget -O - https://www.encodeproject.org/files/ENCFF356LFX/@@download/ENCFF356LFX.bed.gz | zcat | sort -V -k1,3 > "resources/blacklist_hg38.bed" wget -O - https://github.com/Boyle-Lab/Blacklist/raw/master/lists/mm10-blacklist.v2.bed.gz | zcat | sort -V -k1,3 > "resources/blacklist_mm10.bed"
index_mm10/*.bt2,index_hg38/*.bt2: Bowtie 2 genome indexRequired? Yes.
config.yamlkey to specify the path to the index:bowtie2_indexUsed by: Snakefile
rule bowtie2_alignandrule effective_genome_sizeIf you do not have an existing Bowtie 2 index, you can download pre-built indices from the Bowtie 2 developers:
# for human primary assembly hg38 mkdir -p resources/index_hg38 wget https://genome-idx.s3.amazonaws.com/bt/GRCh38_noalt_as.zip unzip -j -d resources/index_hg38 GRCh38_noalt_as.zip \*.bt2 # for mouse primary assembly mm10 mkdir -p resources/index_mm10 wget https://genome-idx.s3.amazonaws.com/bt/mm10.zip unzip -j -d resources/index_mm10 mm10.zip \*.bt2This will create a set of files under
resources/index_hg38orresources/index_mm10. If we want to use themm10genome assembly, for example, the code above will populateresources/index_mm10with the following files:mm10.1.bt2,mm10.2.bt2,mm10.3.bt2,mm10.4.bt2,mm10.rev.1.bt2,mm10.rev.2.bt2. The path prefix to this index (as accepted by thebowtie2 -x <bt2-idx>argument) is thereforeresources/index_mm10/mm10, which is set in the configuration file,config.yaml.Note that the pre-built indices linked above use UCSC chromosome names (
chr1,chr2, …,chrX,chrY,chrM). If your alignment indices use different chromosome names (e.g., Ensembl chromosome names are1,2, …,X,Y,MT), updatechrom-map.txtsuch that chromosome names in BAM files are converted to UCSC chromosome names. You can check the names of the reference sequences used to build the index by using the commandbowtie2-inspect -n <bt2-idx>.
Linting and formatting
Linting results
1Using workflow specific profile workflow/profiles/default for setting default command line arguments.
2Using barcode config: config/example_config.txt
3Using samples file: config/example_samples.json
4Using 7 tags
5Using cutadapt sequence file -g file:resources/dpm96.fasta
6Using cutadapt sequence file -g file:resources/dpm96.fasta
7Using bead UMI length: 8
8Email (email) not specified in config.yaml. Will not send email on error.
9Using barcode format file: config/example_format.txt
10Using output directory: results
11Using temporary directory: /tmp
12Splitting FASTQ files into 2 chunks for parallel processing
13Will create new conda environment from envs/chipdip.yaml
14Masking reads that align to regions in: resources/blacklist_mm10.bed
15Will generate BAM files for individual targets using:
16 min_oligos: 2
17 proportion: 0.8
18 max_size: 10000
19Will generate bigWig files for individual targets using normalization strategy: None
20Detected the following targets in the barcode config file: ['AB2-A2', 'AB1-A1']
21 Adding 'ambiguous', 'none', 'uncertain', and 'filtered' to the list of targets.
22Lints for snakefile /tmp/tmp7u5zujzq/GuttmanLab-chipdip-pipeline-a5b0ddd/workflow/Snakefile:
23 * Absolute path "/tmp" in line 142:
24 Do not define absolute paths inside of the workflow, since this renders
25 your workflow irreproducible on other machines. Use path relative to the
26 working directory instead, or make the path configurable via a config
27 file.
28 Also see:
29 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
30 * Absolute path "/"{input.bam}" in line 912:
31 Do not define absolute paths inside of the workflow, since this renders
32 your workflow irreproducible on other machines. Use path relative to the
33 working directory instead, or make the path configurable via a config
34 file.
35 Also see:
36 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
37 * Environment variable TMPDIR used but not asserted with envvars directive in line 140.:
38 Asserting existence of environment variables with the envvars directive
39 ensures proper error messages if the user fails to invoke a workflow with
40 all required environment variables defined. Further, it allows snakemake
41 to pass them on in case of distributed execution.
42 Also see:
43 https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#environment-variables
44 * Mixed rules and functions in same snakefile.:
45 Small one-liner functions used only once should be defined as lambda
46 expressions. Other functions should be collected in a common module, e.g.
47 'rules/common.smk'. This makes the workflow steps more readable.
48 Also see:
49 https://snakemake.readthedocs.io/en/latest/snakefiles/modularization.html#includes
50 * Path composition with '+' in line 301:
51 This becomes quickly unreadable. Usually, it is better to endure some
52 redundancy against having a more readable workflow. Hence, just repeat
53 common prefixes. If path composition is unavoidable, use pathlib or
54 (python >= 3.6) string formatting with f"...".
55 * Path composition with '+' in line 38:
56 This becomes quickly unreadable. Usually, it is better to endure some
57 redundancy against having a more readable workflow. Hence, just repeat
58 common prefixes. If path composition is unavoidable, use pathlib or
59 (python >= 3.6) string formatting with f"...".
60 * Path composition with '+' in line 90:
61 This becomes quickly unreadable. Usually, it is better to endure some
62 redundancy against having a more readable workflow. Hence, just repeat
63 common prefixes. If path composition is unavoidable, use pathlib or
64 (python >= 3.6) string formatting with f"...".
65 * Path composition with '+' in line 301:
66 This becomes quickly unreadable. Usually, it is better to endure some
67 redundancy against having a more readable workflow. Hence, just repeat
68 common prefixes. If path composition is unavoidable, use pathlib or
69 (python >= 3.6) string formatting with f"...".
70
71Lints for snakefile /tmp/tmp7u5zujzq/GuttmanLab-chipdip-pipeline-a5b0ddd/workflow/pipeline_counts.smk:
72 * Mixed rules and functions in same snakefile.:
73 Small one-liner functions used only once should be defined as lambda
74 expressions. Other functions should be collected in a common module, e.g.
75 'rules/common.smk'. This makes the workflow steps more readable.
76 Also see:
77 https://snakemake.readthedocs.io/en/latest/snakefiles/modularization.html#includes
78
79Lints for rule count_fastq_gz (line 175, /tmp/tmp7u5zujzq/GuttmanLab-chipdip-pipeline-a5b0ddd/workflow/pipeline_counts.smk):
80 * No log directive defined:
81 Without a log directive, all output will be printed to the terminal. In
82 distributed environments, this means that errors are harder to discover.
83 In local environments, output of concurrent jobs will be mixed and become
84 unreadable.
85 Also see:
86 https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
87
88Lints for rule count_bam (line 197, /tmp/tmp7u5zujzq/GuttmanLab-chipdip-pipeline-a5b0ddd/workflow/pipeline_counts.smk):
89 * No log directive defined:
90 Without a log directive, all output will be printed to the terminal. In
91 distributed environments, this means that errors are harder to discover.
92 In local environments, output of concurrent jobs will be mixed and become
93 unreadable.
94 Also see:
95 https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
96
97Lints for rule pipeline_counts (line 211, /tmp/tmp7u5zujzq/GuttmanLab-chipdip-pipeline-a5b0ddd/workflow/pipeline_counts.smk):
98 * Shell command directly uses variable pipeline_counts from outside of the rule:
99 It is recommended to pass all files as input and output, and non-file
100 parameters via the params directive. Otherwise, provenance tracking is
101 less accurate.
102 Also see:
103 https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules
104 * Param dir_counts is a prefix of input or output file but hardcoded:
105 If this is meant to represent a file path prefix, it will fail when
106 running workflow in environments without a shared filesystem. Instead,
107 provide a function that infers the appropriate prefix from the input or
108 output file, e.g.: lambda w, input: os.path.splitext(input[0])[0]
109 Also see:
110 https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules
111 https://snakemake.readthedocs.io/en/stable/tutorial/advanced.html#tutorial-input-functions
112
113Lints for rule clean (line 544, /tmp/tmp7u5zujzq/GuttmanLab-chipdip-pipeline-a5b0ddd/workflow/Snakefile):
114 * No log directive defined:
115 Without a log directive, all output will be printed to the terminal. In
116 distributed environments, this means that errors are harder to discover.
117 In local environments, output of concurrent jobs will be mixed and become
118 unreadable.
119 Also see:
120 https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
121 * Specify a conda environment or container for each rule.:
122 This way, the used software for each specific step is documented, and the
123 workflow can be executed on any machine without prerequisites.
124 Also see:
125 https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#integrated-package-management
126 https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#running-jobs-in-containers
127 * Shell command directly uses variable DIR_OUT from outside of the rule:
128 It is recommended to pass all files as input and output, and non-file
129 parameters via the params directive. Otherwise, provenance tracking is
130 less accurate.
131 Also see:
132 https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules
133 * Shell command directly uses variable DIR_OUT from outside of the rule:
134 It is recommended to pass all files as input and output, and non-file
135 parameters via the params directive. Otherwise, provenance tracking is
136 less accurate.
137 Also see:
138 https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules
139 * Shell command directly uses variable DIR_OUT from outside of the rule:
140 It is recommended to pass all files as input and output, and non-file
141 parameters via the params directive. Otherwise, provenance tracking is
142 less accurate.
143 Also see:
144 https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules
145 * Shell command directly uses variable DIR_OUT from outside of the rule:
146 It is recommended to pass all files as input and output, and non-file
147 parameters via the params directive. Otherwise, provenance tracking is
148 less accurate.
149 Also see:
150 https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules
151 * Shell command directly uses variable DIR_OUT from outside of the rule:
152 It is recommended to pass all files as input and output, and non-file
153 parameters via the params directive. Otherwise, provenance tracking is
154 less accurate.
155 Also see:
156 https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules
157 * Shell command directly uses variable DIR_OUT from outside of the rule:
158 It is recommended to pass all files as input and output, and non-file
159 parameters via the params directive. Otherwise, provenance tracking is
160 less accurate.
161 Also see:
162 https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules
163 * Shell command directly uses variable DIR_OUT from outside of the rule:
164 It is recommended to pass all files as input and output, and non-file
165 parameters via the params directive. Otherwise, provenance tracking is
166 less accurate.
167 Also see:
168 https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules
169 * Shell command directly uses variable DIR_OUT from outside of the rule:
170 It is recommended to pass all files as input and output, and non-file
171 parameters via the params directive. Otherwise, provenance tracking is
172 less accurate.
173 Also see:
174 https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules
175
176Lints for rule log_config (line 561, /tmp/tmp7u5zujzq/GuttmanLab-chipdip-pipeline-a5b0ddd/workflow/Snakefile):
177 * No log directive defined:
178 Without a log directive, all output will be printed to the terminal. In
179 distributed environments, this means that errors are harder to discover.
180 In local environments, output of concurrent jobs will be mixed and become
181 unreadable.
182 Also see:
183 https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
184
185Lints for rule validate (line 569, /tmp/tmp7u5zujzq/GuttmanLab-chipdip-pipeline-a5b0ddd/workflow/Snakefile):
186 * Shell command directly uses variable bowtie2_index from outside of the rule:
187 It is recommended to pass all files as input and output, and non-file
188 parameters via the params directive. Otherwise, provenance tracking is
189 less accurate.
190 Also see:
191 https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules
192 * Shell command directly uses variable validate from outside of the rule:
193 It is recommended to pass all files as input and output, and non-file
194 parameters via the params directive. Otherwise, provenance tracking is
195 less accurate.
196 Also see:
197 https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules
198
199Lints for rule split_fastq (line 597, /tmp/tmp7u5zujzq/GuttmanLab-chipdip-pipeline-a5b0ddd/workflow/Snakefile):
200 * Shell command directly uses variable split_fastq from outside of the rule:
201
202... (truncated)
Formatting results
1[DEBUG]
2[DEBUG] In file "/tmp/tmp7u5zujzq/GuttmanLab-chipdip-pipeline-a5b0ddd/workflow/Snakefile": Formatted content is different from original
3[INFO] 1 file(s) would be changed 😬
4
5snakefmt version: 0.11.2