GuttmanLab/chipdip-pipeline

Pipeline to process sequencing reads from a ChIP-DIP experiment

Overview

Latest release: v3.0.0, Last update: 2025-10-31

Linting: linting: failed, Formatting: formatting: failed

Topics: chip-seq snakemake

Deployment

Step 1: Install Snakemake and Snakedeploy

Snakemake and Snakedeploy are best installed via the Conda. It is recommended to install conda via Miniforge. Run

conda create -c conda-forge -c bioconda -c nodefaults --name snakemake snakemake snakedeploy

to install both Snakemake and Snakedeploy in an isolated environment. For all following commands ensure that this environment is activated via

conda activate snakemake

For other installation methods, refer to the Snakemake and Snakedeploy documentation.

Step 2: Deploy workflow

With Snakemake and Snakedeploy installed, the workflow can be deployed as follows. First, create an appropriate project working directory on your system and enter it:

mkdir -p path/to/project-workdir
cd path/to/project-workdir

In all following steps, we will assume that you are inside of that directory. Then run

snakedeploy deploy-workflow https://github.com/GuttmanLab/chipdip-pipeline . --tag v3.0.0

Snakedeploy will create two folders, workflow and config. The former contains the deployment of the chosen workflow as a Snakemake module, the latter contains configuration files which will be modified in the next step in order to configure the workflow to your needs.

Step 3: Configure workflow

To configure the workflow, adapt config/config.yml to your needs following the instructions below.

Step 4: Run workflow

The deployment method is controlled using the --software-deployment-method (short --sdm) argument.

To run the workflow with automatic deployment of all required software via conda/mamba, use

snakemake --cores all --sdm conda

Snakemake will automatically detect the main Snakefile in the workflow subfolder and execute the workflow module that has been defined by the deployment in step 2.

For further options such as cluster and cloud execution, see the docs.

Step 5: Generate report

After finalizing your data analysis, you can automatically generate an interactive visual HTML report for inspection of results together with parameters and code inside of the browser using

snakemake --report report.zip

Configuration

The following section is imported from the workflow’s config/README.md.

The pipeline is configured relative to the following directories:

  • working directory: in decreasing order of precedence, the path specified by the --directory command-line parameter passed to Snakemake, the path specified by the workdir: directive in the Snakefile, or the directory in which Snakemake was invoked.

  • workflow directory: directory containing the Snakefile

  • input directory: directory containing the config/ and resources/ folders

  • output-directory: directory containing the pipeline output

For a complete description of the directory structures, and for relevant workflow profile configuration settings, see the main repository README.

Configuration Files

These files are located under <input_directory>/config/.

  1. config.yaml: Pipeline configuration - YAML file containing the processing settings and paths of required input files. Paths are specified relative to the working directory.

    • Required? Yes. Must be provided in the working directory, or specified via --configfile <path_to_config.yaml> when invoking Snakemake.

    • Required keys

    • Optional keys: If these keys are omitted from config.yaml or set to null, then they will take on the default values indicated.

      • output_dir (default = "results"): path to create the output directory within which all intermediate and output files are placed.

      • temp_dir (default = $TMPDIR (if set) or "/tmp"): path to a temporary directory with sufficient free disk space, such as used by the -T option of GNU sort

      • barcode_format (default = null): path to barcode format file (e.g., format.txt). If null, no barcode validation is performed.

      • conda_env (default = "envs/chipdip.yaml"): either a path to a conda environment YAML file (“*.yml” or “*.yaml”) or the name of an existing conda environment. If the path to a conda environment YAML file, Snakemake will create a new conda environment within the .snakemake folder of the working directory. If a relative path is used, the path is interpreted as relative to the Snakefile.

      • mask (default = null): path to BED file of genomic regions to ignore, such as ENCODE blacklist regions; reads mapping to these regions are discarded. If null, no masking is performed.

      • path_chrom_map (default = null): path to chromosome name map file. If null, chromosome renaming and filtering are skipped, and the final BAM and/or bigWig files will use all chromosome names as-is from the Bowtie 2 index.

      • deduplication_method (default = "RT&start&end"): specify keys to use for chromatin reads deduplication, in addition to the cluster barcode. Alignment positions (‘start’ and/or ‘end’) and/or the DPM tag (‘RT’) can be combined using ‘&’ (AND) or ‘|’ (OR) operators, with ‘&’ operators taking precedence.

      • num_chunks (default = 2): integer between 1 and 99 giving the number of chunks to split FASTQ files from each sample into for parallel processing

      • generate_splitbams (default = false): boolean value indicating whether to generate separate BAM files for each antibody target

      • min_oligos (default = 2): integer giving the minimum count of deduplicated antibody oligo reads in a cluster for that cluster to be assigned to the corresponding antibody target; this criteria is intersected (AND) with the proportion and max_size criteria

      • proportion (default = 0.8): float giving the minimum proportion of deduplicated antibody oligo reads in a cluster for that cluster to be assigned to the corresponding antibody target; this criteria is intersected (AND) with the min_oligos and max_size criteria

      • max_size (default = 10000): integer giving the maximum count of deduplicated genomic DNA reads in a cluster for that cluster to be to be assigned to the corresponding antibody target; this criteria is intersected (AND) with the proportion and max_size criteria

      • merge_samples (default = false): boolean indicating whether to merge cluster files and target-specific BAM and bigWig files across samples

      • binsize (default = false): integer specifying bigWig binsize; set to false to skip bigWig generation. Only relevant if generate_splitbams is true.

      • bigwig_normalization (default = "None"): normalization strategy for calculating coverage from reads; passed to the --normalizeUsing argument for the bamCoverage command from the deepTools suite. As of version 3.5.2, deepTools bamCoverage currently supports RPKM, CPM, BPM, RPGC, or None. Only relevant if bigWig generation is requested (i.e., generate_splitbams is true and binsize is not false).

      • effective_genome_size (default = null): integer specifying effective genome size (see deepTools documentation for a definition). If null, effective genome size is computed as the number of unmasked sequences in the Bowtie 2 index, after selecting for chromosomes specified in the chromosome name map file and excluding regions specified by the mask file. Only relevant if bigWig generation is requested using normalization strategy RPGC (i.e., generate_splitbams is true, binsize is not false, and bigwig_normalization is RPGC).

      • email (default = null): email to send error notifications to if errors are encountered during the pipeline. If null, no emails are sent.

    • Additional notes

      • null values can be specified explicitly (e.g., format: null) or implicitly (e.g., format: ).

      • For keys format, mask, path_chrom_map, and email, an empty string "" is treated identically to if the value is null.

  2. samples.json: Samples file - JSON file with the paths of FASTQ files (read1, read2) to process.

    • Required? Yes.

    • config.yaml key to specify the path to this file: samples

    • This can be prepared using fastq2json.py --fastq_dir <path_to_directory_of_FASTQs> or manually formatted as follows:

      {
         "sample1": {
           "R1": [
             "<path_to_data>/sample1_run1_R1.fastq.gz",
             "<path_to_data>/sample1_run2_R1.fastq.gz",
           ],
           "R2": [
             "<path_to_data>/sample1_run1_R2.fastq.gz",
             "<path_to_data>/sample1_run2_R2.fastq.gz",
           ]
         },
         "sample2": {
           "R1": [
             "<path_to_data>/sample2_R1.fastq.gz"
           ],
           "R2": [
             "<path_to_data>/sample2_R2.fastq.gz"
           ]
         },
         ...
      }
      
    • Data assumptions:

      • FASTQ files are gzip-compressed.

      • Read names do not contain two consecutive colons (::). This is required because the pipeline adds :: to the end of read names before adding barcode information; the string :: is used as a delimiter in the pipeline to separate the original read name from the identified barcode.

    • If there are multiple FASTQ files per read orientation per sample (as shown for sample1 in the example above), the pipeline will concatenate them and process them together as the same sample.

    • Each sample is processed independently, generating independent BAM files and statistics for quality assessment (barcode identification efficiency, cluster statistics, cluster size distributions, splitbam statistics). For ease of comparison, all samples are overlaid together in quality assessment plots.

    • The provided sample read files under the data/ folder were simulated via a Google Colab notebook. The genomic DNA reads correspond to ChIP-seq peaks on chromosome 19 (mm10) for transcription factors MYC (simulated as corresponding to Antibody ID BEAD_AB1-A1) and TCF12 (simulated as corresponding to Antibody ID BEAD_AB2-A2).

    • Sample names (the keys of the samples JSON file) cannot contain any periods (.). This is enforced to simplify wildcard pattern matching in the Snakefile and to allow the use of periods to delimit tags in a barcode string.

  3. config.txt: Barcode config file - text file containing the sequences of split-pool tags and the expected split-pool barcode structure.

    • Required? Yes.

    • config.yaml key to specify the path to this file: barcode_config

    • Used by: scripts/java/BarcodeIdentification_v1.2.0.jar (Snakefile rule barcode_id), scripts/python/fastq_to_bam.py (Snakefile rule fastq_to_bam), and scripts/python/barcode_identification_efficiency.py (Snakefile rule barcode_identification_efficiency).

      • This file is also parsed in the Snakefile itself to determine the length of the barcode (i.e., the number of rounds of barcoding) and if generate_splitbams is set to true in config.yaml, the set of antibody targets for which to generate individual de-multiplexed BAM files (and bigWig file too, if requested).

    • Format: SPRITE configuration file (see our SPRITE GitHub Wiki or Nature Protocols paper for details).

      • Blank lines and lines starting with # are ignored.

      • An example barcoding configuration file is annotated below:

        # Barcoding layout for read 1 and read 2
        # - Y represents a terminal tag
        # - ODD, EVEN, and DPM indicate their respective tags
        # - SPACER accounts for the 7-nt sticky ends that allow ligation between tags
        READ1 = DPM
        READ2 = Y|SPACER|ODD|SPACER|EVEN|SPACER|ODD|SPACER|EVEN|SPACER|ODD
        
        # DPM tag sequences formatted as tab-delimited lines
        # 1. Tag category: DPM
        # 2. Tag name: must contain "DPM", such as "DPM<xxx>"; must NOT contain "BEAD"
        #    - Can only contain alphanumeric characters, underscores, and hyphens,
        #      i.e., must match the regular expression "[a-zA-Z0-9_\-]+"
        # 3. Tag sequence (see resources/dpm96.fasta)
        # 4. Tag error tolerance: acceptable Hamming distance between
        #    expected tag sequence (column 3) and tag sequence in the read
        DPM	DPMBot6_1-A1	TGGGTGTT	0
        DPM	DPMBot6_2-A2	TGACATGT	0
        ...
        
        # Antibody ID sequences formatted as tab-delimited lines
        # - Identical format as for DPM tag sequences, except that Tag name (column 2)
        #   must start with "BEAD_".
        # - Tag sequences must match resources/bpm.fasta
        DPM	BEAD_AB1-A1	GGAACAGTT	0
        DPM	BEAD_AB2-A2	CGCCGAATT	0
        ...
        
        # Split-pool tag sequences: same 4-column tab-delimited format as the
        #   DPM and Antibody ID section above, except that
        #   Tag category (column 1) is now ODD, EVEN, or Y.
        #   Tag name must NOT contain "BEAD" or "DPM".
        EVEN	EvenBot_1-A1	ATACTGCGGCTGACG	2
        EVEN	EvenBot_2-A2	GTAGGTTCTGGAATC	2
        ...
        ODD	OddBot_1-A1	TTCGTGGAATCTAGC	2
        ODD	OddBot_2-A2	CCTGTGCGTTAGAGT	2
        ...
        Y	NYStgBot_1-A1	TATTATGGT	0
        Y	NYStgBot_2-A2	TAGCTACCTT	0
        ...
        
    • Notes regarding the entries in example_config.txt

      • Names: Each name ends with #-Well (for example, 4-A4) where the # gives the row-major index of the tag in a 96-well plate, and Well denotes the corresponding row and column.

      • Because only the first 2 antibody IDs are included in the example dataset, the other antibody ID rows are commented out. This prevents generation of empty (0 byte) placeholder files for the other 94 antibody IDs.

      • Sequences

        • The design of a DPM tags allows for 9 bp of unique sequence, but only 8 bp are used in the published SPRITE tag set (in bottom tags, the 9th bp is currently a constant 'T'). example_config.txt therefore only includes the unique 8 bp sequences.

        • The design of EVEN and ODD tags allows for 17 bp of unique sequence, but only 16 bp are used in the published SPRITE tag set (in bottom tags, the 17th bp is currently a constant 'T'). example_config.txt further trims the 1st unique bp from the bottom tag, leaving only the middle 15 bp unique bottom sequence.

        • The design of Y (terminal) tags allows for 9-12 bp of unique sequence.

  4. format.txt: Barcode format file - tab-delimited text file indicating which split-pool barcode tags are valid in which round of split-pool barcoding (i.e., at which positions in the barcoding string).

    • Required? No, but highly recommended.

    • config.yaml key to specify the path to this file: barcode_format

    • Used by: scripts/python/split_bpm_dpm.py (Snakefile rule split_bpm_dpm)

    • Column 1 indicates the zero-indexed position of the barcode string where a tag can be found.

      • Term barcode tags (Y) are position 0; the second to last round of barcoding tags are position 1; etc. A value of -1 in the position column indicates that the barcode tag was not used in the experiment.

    • Column 2 indicates the name of the tag. This must be the same as the name of the tag in the barcode config file. If the same tag is used in multiple barcoding rounds, then it should appear multiple times in column 2, but with different values in column 1 indicating which rounds it is used in.

  5. chrom_map.txt: Chromosome names map - tab-delimited text file specifying which chromosomes from the Bowtie 2 index to keep and how to rename them (if at all).

    • Required? No, but necessary if using a blacklist mask that uses different chromosome names than used in the Bowtie 2 index.

    • config.yaml key to specify the path to this file: path_chrom_map

    • Used by: scripts/python/rename_and_filter_chr.py (Snakefile rule rename_and_filter_chr, rule merge_mask, and rule effective_genome_size)

    • Column 1 specifies chromosomes (following naming convention used in the index) to keep.

      • The order of chromosomes provided here is maintained in the SAM/BAM file header, and consequently specifies the coordinate sorting order at the reference sequence level.

    • Column 2 specifies new chromosome names for the corresponding chromosomes in column 1.

    • The provided chrom-map.txt in this repository contains examples for retaining only canonical human or mouse chromosomes (i.e., excluding alternate loci, unlocalized sequences, and unplaced sequences) and renaming them to UCSC chromosome names (i.e., chr1, chr2, …, chrX, chrY, chrM) as needed. The header of provided file also includes more detailed documentation about the specific format requirements, such as allowed characters.

Resource Files

These files are located under <input_directory>/resources/.

  1. bpm.fasta: FASTA file containing the sequences of Antibody IDs

    • Required? Yes.

    • config.yaml key to specify the path to this file: cutadapt_oligos

    • Used by: cutadapt (Snakefile rule cutadapt_oligo)

    • Each sequence should be preceded by ^ to anchor the sequence during cutadapt trimming (see Snakefile rule cutadapt_oligo).

  2. dpm96.fasta: FASTA file containing the sequences of DPM tags

    • Required? Yes.

    • config.yaml key to specify the path to this file: cutadapt_dpm

    • Used by: cutadapt (Snakefile rule cutadapt_dpm)

    • Each of these sequences are 10 nt long, consisting of a unique 9 nt DPM_Bottom sequences as originally designed for SPRITE (technically, only the first 8 nt are unique, and the 9th sequence is always a T), plus a T that is complementary to a 3’ A added to a genomic DNA sequence via dA-tailing.

  1. blacklist_hg38.bed, blacklist_mm10.bed: blacklisted genomic regions for ChIP-seq data

    • Required? No, but highly recommended.

    • config.yaml key to specify the path to this file: mask

    • Used by: Snakefile rule merge_mask, whose output is used by rule repeat_mask and rule effective_genome_size

    • For human genome release hg38, we use ENCFF356LFX from ENCODE. For mouse genome release mm10, we use mm10-blacklist.v2.bed.gz. These BED files use UCSC chromosome names (e.g., chr1, chr2, …). The pipeline performs chromosome name remapping (if specified) before this step.

      • Reference paper: Amemiya HM, Kundaje A, Boyle AP. The ENCODE Blacklist: Identification of Problematic Regions of the Genome. Sci Rep. 2019;9(1):9354. doi:10.1038/s41598-019-45839-z

      • Example code used to download them into the resources/ directory:

        wget -O - https://www.encodeproject.org/files/ENCFF356LFX/@@download/ENCFF356LFX.bed.gz |
            zcat |
            sort -V -k1,3 > "resources/blacklist_hg38.bed"
        
        wget -O - https://github.com/Boyle-Lab/Blacklist/raw/master/lists/mm10-blacklist.v2.bed.gz |
            zcat |
            sort -V -k1,3 > "resources/blacklist_mm10.bed"
        
  2. index_mm10/*.bt2, index_hg38/*.bt2: Bowtie 2 genome index

    • Required? Yes.

    • config.yaml key to specify the path to the index: bowtie2_index

    • Used by: Snakefile rule bowtie2_align and rule effective_genome_size

    • If you do not have an existing Bowtie 2 index, you can download pre-built indices from the Bowtie 2 developers:

      # for human primary assembly hg38
      mkdir -p resources/index_hg38
      wget https://genome-idx.s3.amazonaws.com/bt/GRCh38_noalt_as.zip
      unzip -j -d resources/index_hg38 GRCh38_noalt_as.zip \*.bt2
      
      # for mouse primary assembly mm10
      mkdir -p resources/index_mm10
      wget https://genome-idx.s3.amazonaws.com/bt/mm10.zip
      unzip -j -d resources/index_mm10 mm10.zip \*.bt2
      

      This will create a set of files under resources/index_hg38 or resources/index_mm10. If we want to use the mm10 genome assembly, for example, the code above will populate resources/index_mm10 with the following files: mm10.1.bt2, mm10.2.bt2, mm10.3.bt2, mm10.4.bt2, mm10.rev.1.bt2, mm10.rev.2.bt2. The path prefix to this index (as accepted by the bowtie2 -x <bt2-idx> argument) is therefore resources/index_mm10/mm10, which is set in the configuration file, config.yaml.

      Note that the pre-built indices linked above use UCSC chromosome names (chr1, chr2, …, chrX, chrY, chrM). If your alignment indices use different chromosome names (e.g., Ensembl chromosome names are 1, 2, …, X, Y, MT), update chrom-map.txt such that chromosome names in BAM files are converted to UCSC chromosome names. You can check the names of the reference sequences used to build the index by using the command bowtie2-inspect -n <bt2-idx>.

Linting and formatting

Linting results

  1Using workflow specific profile workflow/profiles/default for setting default command line arguments.
  2Using barcode config: config/example_config.txt
  3Using samples file: config/example_samples.json
  4Using 7 tags
  5Using cutadapt sequence file -g file:resources/dpm96.fasta
  6Using cutadapt sequence file -g file:resources/dpm96.fasta
  7Using bead UMI length: 8
  8Email (email) not specified in config.yaml. Will not send email on error.
  9Using barcode format file: config/example_format.txt
 10Using output directory: results
 11Using temporary directory: /tmp
 12Splitting FASTQ files into 2 chunks for parallel processing
 13Will create new conda environment from envs/chipdip.yaml
 14Masking reads that align to regions in: resources/blacklist_mm10.bed
 15Will generate BAM files for individual targets using:
 16	min_oligos: 2
 17	proportion: 0.8
 18	max_size: 10000
 19Will generate bigWig files for individual targets using normalization strategy: None
 20Detected the following targets in the barcode config file: ['AB2-A2', 'AB1-A1']
 21  Adding 'ambiguous', 'none', 'uncertain', and 'filtered' to the list of targets.
 22Lints for snakefile /tmp/tmp7u5zujzq/GuttmanLab-chipdip-pipeline-a5b0ddd/workflow/Snakefile:
 23    * Absolute path "/tmp" in line 142:
 24      Do not define absolute paths inside of the workflow, since this renders
 25      your workflow irreproducible on other machines. Use path relative to the
 26      working directory instead, or make the path configurable via a config
 27      file.
 28      Also see:
 29      https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
 30    * Absolute path "/"{input.bam}" in line 912:
 31      Do not define absolute paths inside of the workflow, since this renders
 32      your workflow irreproducible on other machines. Use path relative to the
 33      working directory instead, or make the path configurable via a config
 34      file.
 35      Also see:
 36      https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#configuration
 37    * Environment variable TMPDIR used but not asserted with envvars directive in line 140.:
 38      Asserting existence of environment variables with the envvars directive
 39      ensures proper error messages if the user fails to invoke a workflow with
 40      all required environment variables defined. Further, it allows snakemake
 41      to pass them on in case of distributed execution.
 42      Also see:
 43      https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#environment-variables
 44    * Mixed rules and functions in same snakefile.:
 45      Small one-liner functions used only once should be defined as lambda
 46      expressions. Other functions should be collected in a common module, e.g.
 47      'rules/common.smk'. This makes the workflow steps more readable.
 48      Also see:
 49      https://snakemake.readthedocs.io/en/latest/snakefiles/modularization.html#includes
 50    * Path composition with '+' in line 301:
 51      This becomes quickly unreadable. Usually, it is better to endure some
 52      redundancy against having a more readable workflow. Hence, just repeat
 53      common prefixes. If path composition is unavoidable, use pathlib or
 54      (python >= 3.6) string formatting with f"...".
 55    * Path composition with '+' in line 38:
 56      This becomes quickly unreadable. Usually, it is better to endure some
 57      redundancy against having a more readable workflow. Hence, just repeat
 58      common prefixes. If path composition is unavoidable, use pathlib or
 59      (python >= 3.6) string formatting with f"...".
 60    * Path composition with '+' in line 90:
 61      This becomes quickly unreadable. Usually, it is better to endure some
 62      redundancy against having a more readable workflow. Hence, just repeat
 63      common prefixes. If path composition is unavoidable, use pathlib or
 64      (python >= 3.6) string formatting with f"...".
 65    * Path composition with '+' in line 301:
 66      This becomes quickly unreadable. Usually, it is better to endure some
 67      redundancy against having a more readable workflow. Hence, just repeat
 68      common prefixes. If path composition is unavoidable, use pathlib or
 69      (python >= 3.6) string formatting with f"...".
 70
 71Lints for snakefile /tmp/tmp7u5zujzq/GuttmanLab-chipdip-pipeline-a5b0ddd/workflow/pipeline_counts.smk:
 72    * Mixed rules and functions in same snakefile.:
 73      Small one-liner functions used only once should be defined as lambda
 74      expressions. Other functions should be collected in a common module, e.g.
 75      'rules/common.smk'. This makes the workflow steps more readable.
 76      Also see:
 77      https://snakemake.readthedocs.io/en/latest/snakefiles/modularization.html#includes
 78
 79Lints for rule count_fastq_gz (line 175, /tmp/tmp7u5zujzq/GuttmanLab-chipdip-pipeline-a5b0ddd/workflow/pipeline_counts.smk):
 80    * No log directive defined:
 81      Without a log directive, all output will be printed to the terminal. In
 82      distributed environments, this means that errors are harder to discover.
 83      In local environments, output of concurrent jobs will be mixed and become
 84      unreadable.
 85      Also see:
 86      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
 87
 88Lints for rule count_bam (line 197, /tmp/tmp7u5zujzq/GuttmanLab-chipdip-pipeline-a5b0ddd/workflow/pipeline_counts.smk):
 89    * No log directive defined:
 90      Without a log directive, all output will be printed to the terminal. In
 91      distributed environments, this means that errors are harder to discover.
 92      In local environments, output of concurrent jobs will be mixed and become
 93      unreadable.
 94      Also see:
 95      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
 96
 97Lints for rule pipeline_counts (line 211, /tmp/tmp7u5zujzq/GuttmanLab-chipdip-pipeline-a5b0ddd/workflow/pipeline_counts.smk):
 98    * Shell command directly uses variable pipeline_counts from outside of the rule:
 99      It is recommended to pass all files as input and output, and non-file
100      parameters via the params directive. Otherwise, provenance tracking is
101      less accurate.
102      Also see:
103      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules
104    * Param dir_counts is a prefix of input or output file but hardcoded:
105      If this is meant to represent a file path prefix, it will fail when
106      running workflow in environments without a shared filesystem. Instead,
107      provide a function that infers the appropriate prefix from the input or
108      output file, e.g.: lambda w, input: os.path.splitext(input[0])[0]
109      Also see:
110      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules
111      https://snakemake.readthedocs.io/en/stable/tutorial/advanced.html#tutorial-input-functions
112
113Lints for rule clean (line 544, /tmp/tmp7u5zujzq/GuttmanLab-chipdip-pipeline-a5b0ddd/workflow/Snakefile):
114    * No log directive defined:
115      Without a log directive, all output will be printed to the terminal. In
116      distributed environments, this means that errors are harder to discover.
117      In local environments, output of concurrent jobs will be mixed and become
118      unreadable.
119      Also see:
120      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
121    * Specify a conda environment or container for each rule.:
122      This way, the used software for each specific step is documented, and the
123      workflow can be executed on any machine without prerequisites.
124      Also see:
125      https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#integrated-package-management
126      https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#running-jobs-in-containers
127    * Shell command directly uses variable DIR_OUT from outside of the rule:
128      It is recommended to pass all files as input and output, and non-file
129      parameters via the params directive. Otherwise, provenance tracking is
130      less accurate.
131      Also see:
132      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules
133    * Shell command directly uses variable DIR_OUT from outside of the rule:
134      It is recommended to pass all files as input and output, and non-file
135      parameters via the params directive. Otherwise, provenance tracking is
136      less accurate.
137      Also see:
138      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules
139    * Shell command directly uses variable DIR_OUT from outside of the rule:
140      It is recommended to pass all files as input and output, and non-file
141      parameters via the params directive. Otherwise, provenance tracking is
142      less accurate.
143      Also see:
144      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules
145    * Shell command directly uses variable DIR_OUT from outside of the rule:
146      It is recommended to pass all files as input and output, and non-file
147      parameters via the params directive. Otherwise, provenance tracking is
148      less accurate.
149      Also see:
150      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules
151    * Shell command directly uses variable DIR_OUT from outside of the rule:
152      It is recommended to pass all files as input and output, and non-file
153      parameters via the params directive. Otherwise, provenance tracking is
154      less accurate.
155      Also see:
156      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules
157    * Shell command directly uses variable DIR_OUT from outside of the rule:
158      It is recommended to pass all files as input and output, and non-file
159      parameters via the params directive. Otherwise, provenance tracking is
160      less accurate.
161      Also see:
162      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules
163    * Shell command directly uses variable DIR_OUT from outside of the rule:
164      It is recommended to pass all files as input and output, and non-file
165      parameters via the params directive. Otherwise, provenance tracking is
166      less accurate.
167      Also see:
168      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules
169    * Shell command directly uses variable DIR_OUT from outside of the rule:
170      It is recommended to pass all files as input and output, and non-file
171      parameters via the params directive. Otherwise, provenance tracking is
172      less accurate.
173      Also see:
174      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules
175
176Lints for rule log_config (line 561, /tmp/tmp7u5zujzq/GuttmanLab-chipdip-pipeline-a5b0ddd/workflow/Snakefile):
177    * No log directive defined:
178      Without a log directive, all output will be printed to the terminal. In
179      distributed environments, this means that errors are harder to discover.
180      In local environments, output of concurrent jobs will be mixed and become
181      unreadable.
182      Also see:
183      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
184
185Lints for rule validate (line 569, /tmp/tmp7u5zujzq/GuttmanLab-chipdip-pipeline-a5b0ddd/workflow/Snakefile):
186    * Shell command directly uses variable bowtie2_index from outside of the rule:
187      It is recommended to pass all files as input and output, and non-file
188      parameters via the params directive. Otherwise, provenance tracking is
189      less accurate.
190      Also see:
191      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules
192    * Shell command directly uses variable validate from outside of the rule:
193      It is recommended to pass all files as input and output, and non-file
194      parameters via the params directive. Otherwise, provenance tracking is
195      less accurate.
196      Also see:
197      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules
198
199Lints for rule split_fastq (line 597, /tmp/tmp7u5zujzq/GuttmanLab-chipdip-pipeline-a5b0ddd/workflow/Snakefile):
200    * Shell command directly uses variable split_fastq from outside of the rule:
201
202... (truncated)

Formatting results

1[DEBUG] 
2[DEBUG] In file "/tmp/tmp7u5zujzq/GuttmanLab-chipdip-pipeline-a5b0ddd/workflow/Snakefile":  Formatted content is different from original
3[INFO] 1 file(s) would be changed 😬
4
5snakefmt version: 0.11.2