spats_tool

The spats_tool command can be used to perform most of the standard SPATS tasks. The first thing to do to use the spats_tool is to designate a directory for your experiment. In general, this should probably be a new directory for each new data set; for organizational purposes, it’s best not to have too many other files in the folder. The directory needs to be on the same filesystem as your data files.

As an example, we’ll create a folder for a new “testing” experiment:

$ mkdir testing_2017_08_10
$ cd testing_2017_08_10
$ spats_tool init

The tool works using a spats.config file, which is a text file that provides the information required for the various tool functions. spats_tool init will create a default config file, be sure to edit it after it’s created. The rest of the information is about the locations of the target and input files. Here’s a sample:

[spats]
preseq = my_preseq_file.fsa  # optional
target = my_target.fa
r1 = /path/to/data/experiment_data_R1.fastq.gz
r2 = /path/to/data/experiment_data_R2.fastq.gz
cotrans = False

[metadata]
...

The preseq parameter is optional and only required for the pre tool. The target, r1, and r2 parameters are required for most tools. If this is a cotrans experiment, set cotrans = True; then provide the paths to the target/R1/R2 files. Note that you can use .gz files for r1/r2, in which case they will be decompressed on the fly, and cleaned up after the run completes.

If you wish to set any configuration for the SPATS run or reads analysis (see below), you can set parameters according to the run.Run documentation; for example, minimum_target_match_length = 12.

The spats_tool command must be run in the experiment directory you created with the spats.config file.

spats_tool pre

The first command you may wish to run is pre, which takes the information from a *.fsa (ABIF) file and creates a plot.

$ spats_tool pre
:pre-sequencing data processed to pre.spats
:pre complete @ 0.02s

This extracts the data and stores it a format from which it can be easily inspected and plotted.

spats_tool reads

The reads command analyzes the experimental data and creates a ` reads.spats` file, which can be used with the visualization tool to analyze the quality of the data.

$ spats_tool reads
:** removing previous reads.spats
:using native reads
Lookup table: 1076 R1 entries, 121 R2 entries.
Lookup table: 1076 R1 entries, 121 R2 entries.
Processing pairs...
Created 8 workers
^^^^^^^^^.v........vvvvvvvvxxxxxxxx
Aggregating data...
Successfully processed 3640 properly paired fragments:
  ...
:tags processed to reads.spats
:reads complete @ 59.38s

spats_tool run

The basic command is run, which performs the SPATS run to compute site reactivities:

$ spats_tool run
:using native cotrans processor
:decompressing /projects/b1044/.../EJS_6_F_10mM_NaF_Rep1_GCCAAT_R1.fastq.gz
:decompress R1 @ 41.51s
:decompressing /projects/b1044/.../EJS_6_F_10mM_NaF_Rep1_GCCAAT_R2.fastq.gz
:decompress R1 @ 93.62s
:wrote output to run.spats
:run complete @ 134.99s

All spats_tool work in the experiment directory and update the spats.log file there; for example, in this case, it looks like:

2017/08/10 13:22 : run, 134.99s
  - ** removing previous run.spats
  - using native cotrans processor
  - decompressing /projects/b1044/.../EJS_6_F_10mM_NaF_Rep1_GCCAAT_R1.fastq.gz
  - decompress R1 @ 47.76s
  - decompressing /projects/b1044/.../EJS_6_F_10mM_NaF_Rep1_GCCAAT_R2.fastq.gz
  - decompress R1 @ 93.00s
  - wrote output to run.spats
  - run complete @ 134.99s

As the output and log indicate, the results of the run are written to the run.spats file, which is a sqlite-DB file that can be used by spats_tool to dump results and create plots. All tools append to this log file, so you have a record of all analyses performed, including date/time stamps.

spats_tool dump

The dump command is used to access the raw data and dump it to CSV. Requires a dump type – options:

  • spats_tool dump reads: dumps the tags data for the reads analysis to reads.csv
  • spats_tool dump run: dumps the treated/untreated count, beta, theta, and rho values from the run analysis to CSV files named for the corresponding targets.