seankmartin/atn-sub-lfp-workflow

Working with snakemake for analysis of SUB LFP

Overview

Topics:

Latest release: 23.02.01, Last update: 2023-03-10

Linting: linting: passed, Formatting:formatting: failed

Deployment

Step 1: Install Snakemake and Snakedeploy

Snakemake and Snakedeploy are best installed via the Mamba package manager (a drop-in replacement for conda). If you have neither Conda nor Mamba, it is recommended to install Miniforge. More details regarding Mamba can be found here.

When using Mamba, run

mamba create -c conda-forge -c bioconda --name snakemake snakemake snakedeploy

to install both Snakemake and Snakedeploy in an isolated environment. For all following commands ensure that this environment is activated via

conda activate snakemake

Step 2: Deploy workflow

With Snakemake and Snakedeploy installed, the workflow can be deployed as follows. First, create an appropriate project working directory on your system and enter it:

mkdir -p path/to/project-workdir
cd path/to/project-workdir

In all following steps, we will assume that you are inside of that directory. Then run

snakedeploy deploy-workflow https://github.com/seankmartin/atn-sub-lfp-workflow . --tag 23.02.01

Snakedeploy will create two folders, workflow and config. The former contains the deployment of the chosen workflow as a Snakemake module, the latter contains configuration files which will be modified in the next step in order to configure the workflow to your needs.

Step 3: Configure workflow

To configure the workflow, adapt config/config.yml to your needs following the instructions below.

Step 4: Run workflow

The deployment method is controlled using the --software-deployment-method (short --sdm) argument.

To run the workflow with automatic deployment of all required software via conda/mamba, use

snakemake --cores all --sdm conda

Snakemake will automatically detect the main Snakefile in the workflow subfolder and execute the workflow module that has been defined by the deployment in step 2.

For further options such as cluster and cloud execution, see the docs.

Step 5: Generate report

After finalizing your data analysis, you can automatically generate an interactive visual HTML report for inspection of results together with parameters and code inside of the browser using

snakemake --report report.zip

Configuration

The following section is imported from the workflow’s config/README.md.

Configuration

The main config for path setup is config.yaml, and simuran_params.yml for analyis parameters. If you have the raw Axona data, you should change the data_directory and ca1_directory parameters in config.yaml to the paths containing the downloaded SUB and CA1 data. Otherwise, create a folder called results in the parent directory to this file, and place the information downloaded from our open data publication there. The other config files are unlikely to require modification.

Possible Error

If you get an error, try updating workflow/Snakefile to have path=workflow/Snakefile instead of path=workflow\Snakefile.

Main config files

config.yaml

This file contains the following variables, in particular, 1 and 2 likely need to be modified:

  1. data_directory: The directory where the SUB data is stored.
  2. ca1_directory: The directory where the CA1 data is stored.
  3. simuran_config: The path to the simuran config file (simuran_params.yml).
  4. openfield_filter: The filter to use for openfield recordings (openfield_recordings.yml).
  5. tmaze_filter: The filter to use for tmaze recordings (tmaze_recordings.yml).
  6. overwrite_nwb: Whether to overwrite the NWB files if they already exist (False).
  7. sleep_only: Whether to only process sleep recordings (False).
  8. overwrite_sleep: Whether to overwrite the sleep analysis files if they already exist (False).
  9. except_nwb_errors: Whether to ignore NWB errors (True).

simuran_params.yml

This file contains individual parameters for each analysis, such as the band to use to consider theta to be in (e.g. 6-12 Hz):

  1. cfg_base_dir: The base directory to use for data referred to by relative paths in the config files.
  2. do_spectrogram_plot: Whether to plot the spectrogram (False).
  3. plot_psd: Whether to plot the power spectrum (True).
  4. image_format: The format to use for images (png).
  5. loader: The name of the loader to use (neurochat).
  6. loader_kwargs: The keyword arguments to pass to the loader.
  7. clean_method: The method to use to clean the LFP signals, by default it zscore normalises the signals and the picks the bipolar electrode signals from these if they don't exceed a standard deviation from the average for non-canulated rats. For canulated rats, it proceeds similarly but uses all clean signals, not just those on the bipolar electrodes.
  8. clean_kwargs: The keyword arguments to pass to the clean method for non-canulated rats.
  9. can_clean_kwargs: The keyword arguments to pass to the clean method for canulated rats.
  10. z_score_threshold: The z-score threshold to use for the LFP cleaning.
  11. fmin: The minimum frequency to consider for filtering.
  12. fmax: The maximum frequency to consider for filtering.
  13. filter_kwargs: The keyword arguments to pass to the filter method.
  14. theta_min, theta_max: The minimum and maximum frequencies to consider for theta.
  15. delta_min, delta_max: The minimum and maximum frequencies to consider for delta.
  16. low_gamma_min, low_gamma_max: The minimum and maximum frequencies to consider for low_gamma.
  17. high_gamma_min, high_gamma_max: The minimum and maximum frequencies to consider for high_gamma.
  18. beta_min, beta_max: The minimum and maximum frequencies to consider for beta.
  19. psd_scale: The scale to use for the power spectrum (decibels or volts).
  20. number_of_shuffles_sta: The number of shuffles of time to use for the STA analysis.
  21. num_spike_shuffles: The number of shuffles of spikes to use for the STA analysis.
  22. max_psd_freq, max_fooof_freq: The maximum frequency to consider for the power spectrum and fooof analysis.
  23. speed_theta_samples_per_second: How many speed theta samples per second to use, as this data is binned.
  24. max_speed: The maximum speed to consider for the speed theta analysis, in cm/s.
  25. tmaze_minf, tmaze_maxf: The minimum and maximum frequencies to consider for the tmaze analysis.
  26. tmaze_winsec: The window size to use for the tmaze analysis, in seconds for LFP analysis.
  27. max_lfp_lengths: How to split up the LFP signal during tmaze analysis. Defaults give 1 second windows.
  28. tmaze_egf: Whether to use eeg or egf (higher rate signal) in tmaze anlaysis.
  29. spindles_use_avg: Whether to run spindle analysis on the average signal or on all signals.
  30. use_first_two_for_ripples: Whether to use the first two signals for ripple analysis or all signals.
  31. lfp_ripple_rate: the rate of the high frequency LFP signal to use for ripple analysis, can be a downsample of the full egf rate.
  32. min_sleep_length: The minimum length of sleep to consider for sleep analysis, in seconds.
  33. only_kay_detect: Whether to only use Kay's algorithm for sleep detection.
  34. except_nwb_errors: Whether to ignore NWB errors (True).
  35. sleep_join_tol: The allowed time of movement between sleep epochs to join them, in seconds (0.0).
  36. sleep_max_interval_size: The maximum allowed time for a sleep epoch to be, before splitting for efficiency, in seconds (300).

Additional config files

tmaze_recordings.yml

This lists how to obtain the tmaze recordings, with 8 control rats and 6 lesion rats considered.

openfield_recordings.yml

This lists the name of the rats to use openfield recordings for, with 6 control rats and 5 lesion rats considered.

Linting and formatting

Linting results

None

Formatting results

[DEBUG] 
[DEBUG] In file "/tmp/tmpdyibimgy/seankmartin-atn-sub-lfp-workflow-d177bfe/workflow/rules/plot_data.smk":  Formatted content is different from original
[DEBUG] 
[DEBUG] In file "/tmp/tmpdyibimgy/seankmartin-atn-sub-lfp-workflow-d177bfe/workflow/Snakefile":  Formatted content is different from original
[DEBUG] 
[DEBUG] In file "/tmp/tmpdyibimgy/seankmartin-atn-sub-lfp-workflow-d177bfe/workflow/rules/analyse_data.smk":  Formatted content is different from original
[DEBUG] 
[DEBUG] In file "/tmp/tmpdyibimgy/seankmartin-atn-sub-lfp-workflow-d177bfe/workflow/rules/process_data.smk":  Formatted content is different from original
[INFO] 4 file(s) would be changed 😬

snakefmt version: 0.8.2