Detailed Options¶
-
spats_shape_seq.run_spats(target_path, r1_path, r2_path, output_path, cotrans=False)¶ - Convenience function for a common-case SPATS run that doesn’t
- require any non-default configuration.
Parameters: - target_path – path to the targets FASTA file
- r1_path – path to the R1 input data FASTQ file
- r2_path – path to the R2 input data FASTQ file
- output_path – path to write resulting reactivities
- cotrans – pass True for cotrans-style experiments.
-
class
spats_shape_seq.spats.Spats(cotrans=False)¶ The main SPATS driver.
Parameters: cotrans – pass True for cotrans-style experiments. -
addTarget(name, seq, rowid=-1)¶
-
addTargets(*target_paths)¶ Used to add one or more target files for processing. Can be called multiple times to add more targets. Inputs are expected to be in FASTA format with one or more targets per path. Must be called before processing.
Parameters: args – one or more filesystem paths to target files.
-
compare_results(other_spats, verbose=False)¶ - Used to compare the results of the current run against another
- SPATS instance. Must be run after running
process_pair_data(), or after loading the data (load()) from a previously-run session.
Parameters: - other_spats –
Spatsinstance to compare. - verbose – set to True for detailed output of mismatched sites.
Returns: (match_count, total) : match_count indicates the number of sites matched, total indicates total number of sites.
-
compute_profiles()¶ Computes beta/theta/c reactivity values after pair data have been processed.
Returns: a profiles.Profilesobject, which contains the reactivities for all targets.
-
counters¶ Returns the underlying
counters.Countersobject, which contains information about site and tag counts.
-
load(input_path)¶ Loads SPATS state from a file.
Parameters: input_path – the path of a previously saved SPATS session.
-
loadTargets(pair_db)¶
-
merge(input_path)¶ Merges SPATS state from a file with existing state.
Parameters: input_path – the path of a previously saved SPATS session.
-
merge_targets(pair_db)¶
-
process_pair(pair)¶ Used process a single
pair.Pair. Typically only used for debugging or analysis of specific cases.Parameters: pair – a pair.Pairto process.
-
process_pair_data(data_r1_path, data_r2_path, force_mask=None)¶ Used to read and process a pair of FASTQ data files.
Note that this parses the pair data into an in-memory SQLite database, which on most modern systems will be fine except for the largest input sets. If you hit memory issues, create a disk-based SQLite DB via
db.PairDBand then useprocess_pair_db().Note that this may be called multiple times to process more than one set of data files before computing profiles.
Parameters: - data_r1_path – path to R1 fragments
- data_r2_path – path to matching R2 fragments.
-
process_pair_db(pair_db, batch_size=65536)¶ Processes pair data provided by a
db.PairDB.Note that this may be called multiple times to process more than one set of inputs before computing profiles.
Parameters: pair_db – a db.PairDBof pairs to process.
-
reset_processor()¶
-
store(output_path)¶ Saves the state of the SPATS run for later processing.
Parameters: output_path – the path for writing the output. Recommended file extension is .spats
-
targets¶
-
validate_results(data_r1_path, data_r2_path, algorithm='find_partial', verbose=False)¶ - Used to validate the results of the current run using against a
- different algorithm. Must be run after running
process_pair_data(), or after loading the data (load()) from a previously-run session.
Parameters: - data_r1_path – path to R1 fragments
- data_r2_path – path to matching R2 fragments.
- algorithm – Generally the default is correct, but you
can select a particular algorithm for data validation (see
run.Run.algorithm). - verbose – set to True for detailed output of mismatched sites.
Returns: True if results validate, False otherwise.
-
write_reactivities(output_path)¶ Convenience function used to write the reactivities to an output file. Must be called after
compute_profiles().param output_path: the path for writing the output.
-
-
class
spats_shape_seq.run.Run¶ Encapsulates the inputs/config required for a Spats run.
-
adapter_b= None¶ default
AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC
-
adapter_t= None¶ default
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
-
algorithm= None¶ Default
find_partial, set tolookupto use the lookup optimization.
-
allow_indeterminate= None¶ Default
False, set toTrueto allow indeterminate nucleotides to be processed as matches. For example, with this set toTrue, the sequenceACNTin a pair will be considered a match (no error) toACGTin the target. See alsoallowed_adapter_errorsandallowed_target_errors.
-
allow_multiple_rt_starts= None¶ Default
False, in which the right edge must match the edge of the target. Set toTrueto allow other possibilities for the right edge.
-
allow_negative_values= None¶ Default
False, set toTrueto allow beta, theta, and rho values to be negative (otherwise, negative values are set to0.0).
-
allowed_adapter_errors= None¶ Default
0, increase to allow the indicated number of errors when performing adapter trimming.
-
allowed_dumbbell_errors= None¶ Default
0, increase to allow the indicated number of errors when performing dumbbell trimming.
-
allowed_target_errors= None¶ Default
0, increase to allow the indicated number of errors (mutations / indels) when matching to the target. WARNING: the ambiguity of the match increases exponentially with this number of mutations/indels; it’s recommended to not set this higher than2.
-
apply_config_restrictions()¶
-
collapse_left_prefixes= None¶ Default
False, set toTrueto treat any read with a 5’ prefix as starting at site zero. Setting this will force the usage of thecount_left_prefixesoption.
-
collapse_only_prefixes= None¶ Default
None, which means collapse all prefixes. Set to a list of comma-separated strings to only collapse prefixes that appear in the list. Specifying these will force thecollapse_left_prefixesandcount_left_prefixesoptions to be True.
-
compute_z_reactivity= None¶ Default
False, set toTrueto also compute reactivity based upon z-scores.
-
config_dict()¶
-
config_string()¶
-
cotrans= None¶ Default
False, set toTrueto run as a cotrans experiment. Pass a single target instead of a generated targets file.
-
cotrans_linker= None¶ Default
CTGACTCGGGCACCAAGGAC, change as necessary for cotrans experiments.
-
cotrans_minimum_length= None¶ Default
20, set to adjust the minimum number of bp to use from the cotrans target.
-
count_edge_mutations= None¶ Defaults to
None. If set tostop_and_mut, will count mutations that are at the site like any other mutation. If set tostop_only, will count the stop but no mutation. In the default behavior, neither stops nor edge mutations are counted.
-
count_left_prefixes= None¶ Default
False, set toTrueto count and report information on pairs that align left of the 5’ end. the count for each different prefix encountered will be reported. Setting this will force the usage of thefind_partialalgorithm.
-
count_mutations= None¶ Defaults to
False. If set toTrue, will count both stops and muations, and incorporate the mutation information into the reactivity profile computations. Note that setting this will forceallowed_target_errorsto be1.
-
count_only_full_reads= None¶ Default
False, set toTrueto only count reads with no stops.
-
debug= None¶ default
False, set toTrueto output detailed run information
-
dumbbell= None¶ Default
None, set to a string sequence for analyses which use a dumbbell sequence (on the front of R2).
-
generate_channel_reads= None¶ Default
False, set toTrueto generate R1/R2 outputs for all matching reads, separated by channel (with handles stripped).
-
generate_sam= None¶ Default
None, set to a string path to generate SAM outputs for the spats run.
-
handle_indels= None¶ Default
False, set toTrueto look for indels (insertions/deltions) and incorporate their counts into the reactivity profile computations. Requires using thefind_partialalgorithm. Note that looking for indels while processing will be at least an order of magnitude slower, but could give more accurate reactivities.
-
ignore_stops_with_mismatched_overlap= None¶ Defaults to
True. When R1 and R2 overlap and have a mismatch on their overlap, the default behavior is throw away the pair. Set this toFalseto have the stop and any mutations counted.
-
indel_gap_extend_cost= None¶ Defaults to
1, set to the value to penalize the extension of indel (insertion or deletion) gaps in the Smith-Waterman alignment algorithm Only applies whenhandle_indelsis True.
-
indel_gap_open_cost= None¶ Defaults to
5, set to the value to penalize the initiation of indel (insertion or deletion) gaps in the Smith-Waterman alignment algorithm Only applies whenhandle_indelsis True.
-
indel_match_value= None¶ Defaults to
3, set to the value to reward matching characters in the Smith-Waterman alignment algorithm Only applies whenhandle_indelsis True.
-
indel_mismatch_cost= None¶ Defaults to
2, set to the value to penalize mismatching characters in the Smith-Waterman alignment algorithm Only applies whenhandle_indelsis True.
-
load_from_config(config_dict)¶
-
log= None¶ defaults to
sys.stdout, set to file-like object to gather debugging info
-
masks= None¶ default
[ 'RRRY', 'YYYR' ], treated mask is first.
-
minimum_adapter_len= None¶ Defaults to
0, set higher to require a minimal amount of adapter in order to do trimming. Generally not necessary since a positive match in the target is required before trimming.
-
minimum_tag_match_length= None¶ Default
8, set to adjust the minimum length for matching tags for the reads analyzer.
-
minimum_target_match_length= None¶ Defaults to 10. Runs will potentially be faster if you set it higher, but may miss some pairs that have only shorter matching subsequences. You can set it lower, but then there’s some chance pairs will match the wrong place – in which case, they will have too many errors and be discarded – and it will allow shorter sequences at the end (which end up being adapter-trimmed) to be accepted. Might want to analyze your targets (xref
target.Target.longest_target_self_matches()) to determine an appropriate value.
-
mutations_require_quality_score= None¶ Defaults to
None. If set to a phred-score integer value (0 - 42), andcount_mutationsisTrue, then this will require the quality score on any mutation to be greater than or equal to the indicated phred score to be counted.
-
num_workers= None¶ Default
None, which auto-detects the number of available cores (usingmultiprocessing.cpu_count()and creates that many workers. Set to an integer to force an explicit number. Set to1to disable multiprocessing (sometimes useful for debugging). Only used when bulk processing input data from withinspats.Spatsorspats.Spats.process_pair_data()orSpats.process_pair_db.
-
pair_length= None¶ Default
None, in which case the pair length is detected from input data. Otherwise, can be set explicitly.
-
quiet= None¶ default
False, set to true to suppress output messages
-
regions_of_interest= None¶ Default
None, specify a list of pairs like (min, max) where each pair specifies a minimum and maximum nucleotide index indicating a region in which to watch for activity. If a read has a stop and/or mutation within this region, it will be tagged with asinteresting. Only meaningfull when using the reads tool and thefind_partialalgorithms.
-
result_set_name= None¶ Defaults to
"default", set to a string to choose the name for this result set. Also used for resuming processing, xrefresume_processing. Result sets can be compared usingdb.PairDB.differing_results()
-
resume_processing= None¶ Default
False, set toTrueto resume processing (if there’s been a previous run usingwriteback_resultsand the sameresult_set_name).
-
rt_primers= None¶ Default
None, which means allow RT starts from either anywhere (whenallow_multiple_rt_startsis True) or from the right (3’) edge of the target only (when it is False). Set to a list of comma-separated strings to restrict where RT starts can happen in the target. Specifying these will force theallow_multiple_rt_startsoption to be True. Note: each primer in the list must unambiguously match a single unique region of the the target.
-
single_target_linker= None¶ Default
None, if set to a sequence, then this will be required as a prefix to R1. Internally, this setsallow_multiple_rt_startstoTrue, andrt_primersto this value.
-
skip_database= None¶ Default
True, set toFalseto parse to a database and only process unique counts. Typically straight parsing is faster, but on some datasets it can be faster to determine and only process unique counts.
-
validate_config()¶
-
writeback_results= None¶ Default
False, set toTrueto write the results back to the input database for further analysis or incremental/resumable processing. xrefresume_processing.
-
-
class
spats_shape_seq.reads.ReadsData(db_path)¶ Encapsulates the data for reads analysis.
Parameters: db_path – the path for the reads data -
pair_db¶ Access the underlying
db.PairDB.
-
parse(target_path, r1_paths, r2_paths, sample_size=100000, show_progress_every=1000000)¶ Used to parse target and r1/r2 reads data for reads processing.
Parameters: - target_path – Path to targets file, must be in FASTA format.
- r1_paths – List of paths to R1 reads files, must be in FASTQ format.
- r2_paths – List of paths to R2 reads files, must be in FASTQ format.
- sample_size – the number of samples to use for analysis. Samples will be (more or less) uniformly randomly selected from the population of pairs.
- show_progress_every – default show a ‘.’ for every 1 million pairs parsed. Set to 0 to disable output.
-
-
class
spats_shape_seq.reads.ReadsAnalyzer(reads_data, cotrans=False)¶ Performs the analysis/tagging required for the reads analysis.
Parameters: - reads_data – the
ReadsDatawith the input data. - cotrans – pass True for cotrans-style experiments.
-
addTagPlugin(tag, handler)¶
-
addTagTarget(name, tag)¶
Processes the tags in the input data for analysis.
-
tag_counts()¶ Returns a dictionary of
{ tag: count }of tags analyzed.
- reads_data – the