pdm_utils.classes package

Submodules

pdm_utils.classes.alchemyhandler module

class pdm_utils.classes.alchemyhandler.AlchemyHandler(database=None, username=None, password=None, dialect='mysql', driver=None)

Bases: object

property URI

Returns a SQLAlchemy URI string from stored credentials.

ask_credentials()

Ask for username and password input to store in AlchemyHandler.

ask_database()

Ask for database input to store in AlchemyHandler.

build_all()

Create and store all relevant SQLAlchemy objects.

build_engine()

Create and store SQLAlchemy Engine object.

build_graph()

Create and store SQLAlchemy MetaData related NetworkX Graph object.

build_mapper()

Create and store SQLAlchemy automapper Base object.

build_metadata()

Create and store SQLAlchemy MetaData object.

build_session()

Create and store SQLAlchemy Session object.

clear()

Clear properties tied to MySQL credentials/database.

connect(ask_database=False, login_attempts=5, pipeline=False)

Ask for input to connect to MySQL and MySQL databases.

Parameters
  • ask_database (Boolean) – Toggle whether to connect to a database.

  • login_attempts (int) – Set number of total login attempts.

construct_engine_string(dialect='mysql', driver='pymysql', username='', password='', database='')

Construct a SQLAlchemy engine URL.

Parameters
  • dialect (str) – Type of SQL database.

  • driver (str) – Name of the Python DBAPI used to connect.

  • username (str) – Username to login to SQL database.

  • password (str) – Password to login to SQL database.

  • database (str) – Name of the database to connect to.

Returns

URI string to create SQLAlchemy engine.

Return type

str

property database

Returns the AlchemyHandler’s set database.

Returns

Returns a copy of the database attribute.

Return type

str

property databases

Returns a copy of the databases available to the current credentials

Returns

Returns the AlchemyHandler’s available databases.

Return type

list[str]

property engine

Returns the AlchemyHandler’s stored engine object.

Returns

Returns the AlchemyHandler’s stored engine object.

Return type

Engine

extract_engine_credentials(engine)

Extract username, password, and/or database from a SQLAlchemy engine.

get_map(table)

Get SQLAlchemy ORM map object.

get_mysql_dbs()

Retrieve database names from MySQL.

Returns

List of database names.

Return type

list

property graph

Returns the AlchemyHandler’s stored graph object.

Returns

Returns the AlchemyHandler’s stored metadata graph object.

Return type

Graph

property login_attempts

Returns the AlchemyHandler’s number of login attempts for login.

Returns

Returns the number of login attempts for login.

Return type

str

property mapper

Returns the AlchemyHandler’s stored automapper object.

Returns

Returns the AlchemyHandler’s stored mapper object.

property metadata

Returns the AlchemyHandler’s stored metadata object.

Returns

Returns the AlchemyHandler’s stored engine object.

Return type

MetaData

property password

Returns the AlchemyHandler’s set password.

Returns

Returns a copy of the password attribute.

Return type

str

property session

Returns the AlchemyHandler’s stored session object.

property username

Returns the AlchemynHandler’s set username.

Returns

Returns a copy of the username attribute.

Return type

str

validate_database()

Validate access to database using stored SQL credentials.

exception pdm_utils.classes.alchemyhandler.MySQLDatabaseError

Bases: Exception

exception pdm_utils.classes.alchemyhandler.SQLCredentialsError

Bases: Exception

exception pdm_utils.classes.alchemyhandler.SQLiteDatabaseError

Bases: Exception

pdm_utils.classes.aragornhandler module

class pdm_utils.classes.aragornhandler.AragornHandler(identifier, sequence)

Bases: object

parse_determinate_trnas()

Searches out_str for matches to a regular expression for Aragorn tRNAs of determinate isotype. :return:

parse_indeterminate_trnas()

Searches out_str for matches to a regular expression for Aragorn tRNAs of indeterminate isotype. :return:

parse_tmrnas()

Searches out_str for matches to a regular expression for Aragorn tmRNAs. :return:

parse_trnas()

Calls two helper methods to parse determinate and indeterminate tRNAs. :return:

read_output()

Reads the Aragorn output file and joins the lines into a single string which it populates into out_str. :return:

run_aragorn(c=False, d=True, m=False, t=True)

Set up Aragorn command, then run it. Default arguments will assume linear sequence to be scanned on both strands for tRNAs only (no tmRNAs). :param c: treat sequence as circular :type c: bool :param d: search both strands of DNA :type d: bool :param m: search for tmRNAs :type m: bool :param t: search for tRNAs :type t: bool :return:

write_fasta()

Writes the search sequence to input file in FASTA format. :return:

pdm_utils.classes.bundle module

Represents a structure to directly compare data between two or more genomes.

class pdm_utils.classes.bundle.Bundle

Bases: object

check_for_errors()

Check evaluation lists of all objects contained in the Bundle and determine how many errors there are.

check_genome_dict(key, expect=True, eval_id=None, success='correct', fail='error', eval_def=None)

Check if a genome is present in the genome dictionary.

Parameters
  • key (str) – The value to be evaluated if it is a valid key in the genome dictionary.

  • expect (bool) – Indicates whether the key is expected to be a valid key in the genome dictionary.

  • eval_id (str) – Unique identifier for the evaluation.

  • success (str) – Default status if the outcome is a success.

  • fail (str) – Default status if the outcome is not a success.

  • eval_def (str) – Description of the evaluation.

check_genome_pair_dict(key, expect=True, eval_id=None, success='correct', fail='error', eval_def=None)

Check if a genome_pair is present in the genome_pair dictionary.

Parameters
  • key – same as for check_genome_dict().

  • expect – same as for check_genome_dict().

  • eval_id – same as for check_genome_dict().

  • success – same as for check_genome_dict().

  • fail – same as for check_genome_dict().

  • eval_def – same as for check_genome_dict().

check_statements(execute_result, execute_msg, eval_id=None, success='correct', fail='error', eval_def=None)

Check if MySQL statements were successfully executed.

Parameters
  • execute_result (int) – Indication if MySQL statements were successfully execute.

  • execute_msg (str) – Description of MySQL statement execution result.

  • eval_id – same as for check_genome_dict().

  • success – same as for check_genome_dict().

  • fail – same as for check_genome_dict().

  • eval_def – same as for check_genome_dict().

check_ticket(eval_id=None, success='correct', fail='error', eval_def=None)

Check for whether a Ticket object is present.

Parameters
  • eval_id – same as for check_genome_dict().

  • success – same as for check_genome_dict().

  • fail – same as for check_genome_dict().

  • eval_def – same as for check_genome_dict().

get_evaluations()

Get all evaluations for all objects stored in the Bundle.

Returns

Dictionary of evaluation lists for each feature.

Return type

dict

set_eval(eval_id, definition, result, status)

Constructs and adds an Evaluation object to the evaluations list.

Parameters
  • eval_id (str) – Unique identifier for the evaluation.

  • definition (str) – Description of the evaluation.

  • result (str) – Description of the outcome of the evaluation.

  • status (str) – Outcome of the evaluation.

set_genome_pair(genome_pair, key1, key2)

Pair two genomes and add to the paired genome dictionary.

Parameters
  • genome_pair (GenomePair) – An empty GenomePair object to stored paried genomes.

  • key1 (str) – A valid key in the Bundle object’s ‘genome_dict’ that indicates the first genome to be paired.

  • key2 (str) – A valid key in the Bundle object’s ‘genome_dict’ that indicates the second genome to be paired.

pdm_utils.classes.cds module

Represents a collection of data about a CDS features that are commonly used to maintain and update SEA-PHAGES phage genomics data.

class pdm_utils.classes.cds.Cds

Bases: object

Class to hold data about a CDS feature.

check_amino_acids(check_set={}, eval_id=None, success='correct', fail='error', eval_def=None)

Check whether all amino acids in the translation are valid.

Parameters
  • check_set (set) – Set of valid amino acids.

  • eval_id – same as for check_attribute().

  • success – same as for check_attribute().

  • fail – same as for check_attribute().

  • eval_def – same as for check_attribute().

check_attribute(attribute, check_set, expect=False, eval_id=None, success='correct', fail='error', eval_def=None)

Check that the attribute value is valid.

Parameters
  • attribute (str) – Name of the CDS object attribute to evaluate.

  • check_set (set) – Set of reference ids.

  • expect (bool) – Indicates whether the attribute value is expected to be present in the check set.

  • eval_id (str) – Unique identifier for the evaluation.

  • success (str) – Default status if the outcome is a success.

  • fail (str) – Default status if the outcome is not a success.

  • eval_def (str) – Description of the evaluation.

check_compatible_gene_and_locus_tag(eval_id=None, success='correct', fail='error', eval_def=None)

Check if gene and locus_tag attributes contain identical numbers.

Parameters
  • eval_id – same as for check_attribute().

  • success – same as for check_attribute().

  • fail – same as for check_attribute().

  • eval_def – same as for check_attribute().

check_description_field(attribute='product', eval_id=None, success='correct', fail='error', eval_def=None)

Check if there are CDS descriptions in unexpected fields.

Evaluates whether the indicated attribute is empty or generic, and other fields contain non-generic data.

Parameters
  • attribute (str) – Indicates the reference attribute for the evaluation (‘product’, ‘function’, ‘note’).

  • eval_id – same as for check_attribute().

  • success – same as for check_attribute().

  • fail – same as for check_attribute().

  • eval_def – same as for check_attribute().

check_gene_structure(eval_id=None, success='correct', fail='error', eval_def=None)

Check if the gene qualifier contains an integer.

Parameters
  • eval_id – same as for check_attribute().

  • success – same as for check_attribute().

  • fail – same as for check_attribute().

  • eval_def – same as for check_attribute().

check_generic_data(attribute=None, eval_id=None, success='correct', fail='error', eval_def=None)

Check if the indicated attribute contains generic data.

Parameters
  • attribute (str) – Indicates the attribute for the evaluation (‘product’, ‘function’, ‘note’).

  • eval_id – same as for check_attribute().

  • success – same as for check_attribute().

  • fail – same as for check_attribute().

  • eval_def – same as for check_attribute().

check_locus_tag_structure(check_value=None, only_typo=False, prefix_set={}, case=True, eval_id=None, success='correct', fail='error', eval_def=None)

Check if the locus_tag is structured correctly.

Parameters
  • check_value (str) – Indicates the genome id that is expected to be present. If None, the ‘genome_id’ parameter is used.

  • only_typo (bool) – Indicates if only the genome id spelling should be evaluated.

  • prefix_set (set) – Indicates valid common prefixes, if a prefix is expected.

  • case (bool) – Indicates whether the locus_tag is expected to be capitalized.

  • eval_id – same as for check_attribute().

  • success – same as for check_attribute().

  • fail – same as for check_attribute().

  • eval_def – same as for check_attribute().

check_magnitude(attribute, expect, ref_value, eval_id=None, success='correct', fail='error', eval_def=None)

Check that the magnitude of a numerical attribute is valid.

Parameters
  • attribute – same as for check_attribute().

  • expect (str) – Comparison symbol indicating direction of magnitude (>, =, <).

  • ref_value (int, float, datetime) – Numerical value for comparison.

  • eval_id – same as for check_attribute().

  • success – same as for check_attribute().

  • fail – same as for check_attribute().

  • eval_def – same as for check_attribute().

check_orientation(format='fr_short', case=True, eval_id=None, success='correct', fail='error', eval_def=None)

Check if orientation is set appropriately.

Relies on the reformat_strand function to manage orientation data.

Parameters
  • format (str) – Indicates how coordinates should be formatted.

  • case (bool) – Indicates whether the orientation data should be cased.

  • eval_id – same as for check_attribute().

  • success – same as for check_attribute().

  • fail – same as for check_attribute().

  • eval_def – same as for check_attribute().

check_translation(eval_id=None, success='correct', fail='error', eval_def=None)

Check that the current and expected translations match.

Parameters
  • eval_id – same as for check_attribute().

  • success – same as for check_attribute().

  • fail – same as for check_attribute().

  • eval_def – same as for check_attribute().

create_seqfeature(type, start, stop, strand)
get_begin_end()

Get feature coordinates in transcription begin-end format.

Returns

(Begin, End) Start and stop coordinates ordered by which coordinate indicates the transcriptional beginning and end of the feature.

Return type

tuple

get_qualifiers(type)

Helper function that uses cds data to populate the qualifiers SeqFeature attribute

Returns

qualifiers(dictionary) is a dictionary with the formating of BioPython’s SeqFeature qualifiers attribute.

reformat_start_and_stop(new_format)

Convert start and stop coordinates to new coordinate format. This also updates the coordinate format attribute to reflect change.

Relies on the reformat_coordinates function.

Parameters

new_format (str) – Indicates how coordinates should be formatted.

set_description(value)

Set the primary raw and processed description attributes.

Parameters

value (str) – Indicates which reference attributes are used to set the attributes (‘product’, ‘function’, ‘note’).

set_description_field(attr, description, delimiter=None, prefix_set=None)

Set a description attribute parsed from a description.

Parameters
  • attr (str) – Attribute to set the description.

  • description (str) – Description data to parse. Also passed to set_num().

  • delimiter (str) – Passed to set_num().

  • prefix_set (set) – Passed to set_num().

set_eval(eval_id, definition, result, status)

Constructs and adds an Evaluation object to the evaluations list.

Parameters
  • eval_id (str) – Unique identifier for the evaluation.

  • definition (str) – Description of the evaluation.

  • result (str) – Description of the outcome of the evaluation.

  • status (str) – Outcome of the evaluation.

set_gene(value, delimiter=None, prefix_set=None)

Set the gene attribute.

Parameters
  • value (str) – Gene data to parse. Also passed to set_num().

  • delimiter (str) – Passed to set_num().

  • prefix_set (set) – Passed to set_num().

set_location_id()

Create a tuple of feature location data.

For start and stop coordinates of the feature, it doesn’t matter whether the feature is complex with a translational frameshift or not. Retrieving the “start” and “stop” boundary attributes return the very beginning and end of the feature, disregarding the inner “join” coordinates. If only the feature transcription “end” coordinate is used, orientation information is required. If transcription “begin” and “end” coordinates are used instead of “start” and “stop” coordinates, no orientation information is required.

set_locus_tag(tag='', delimiter='_', check_value=None)

Set locus tag and parse the locus_tag feature number.

Parameters
  • tag (str) – Input locus_tag data.

  • delimiter (str) – Value used to split locus_tag data.

  • check_value (str) – Indicates genome name or other value that will be used to parse the locus_tag to identify the feature number. If no check_value is provided, the genome_id attribute is used.

set_name(value=None)

Set the feature name.

Ideally, the name of the CDS will be an integer. This information can be stored in multiple fields in the GenBank-formatted flat file. The name is derived from one of several qualifiers.

Parameters

value (str) – Indicates a value that should be used to directly set the name regardless of the ‘gene’ and ‘_locus_tag_num’ attributes.

set_nucleotide_length(seq=False, translation=False)

Set the length of the nucleotide sequence.

Nucleotide length can be computed several different ways, including from the difference of the start and stop coordinates, the length of the transcribed nucleotide sequence, or the length of the translation. For compound features, using either the nucleotide or translation sequence is the accurate way to determine the true length of the feature, but ‘length’ may mean different things in different contexts.

Parameters
  • seq (bool) – Use the nucleotide sequence from the ‘seq’ attribute to compute the length.

  • translation (bool) – Use the translation sequence from the ‘translation’ attribute to compute the length.

set_nucleotide_sequence(value=None, parent_genome_seq=None)

Set the nucleotide sequence of the feature.

This method can directly set the attribute from a supplied ‘value’, or it can retrieve the sequence from the parent genome using Biopython. In this latter case, it relies on a Biopython SeqFeature object for the sequence extraction method and coordinates. If this object was generated from a Biopython-parsed GenBank-formatted flat file, the coordinates are by default ‘0-based half-open’, the object contains coordinates for every part of the feature (e.g. if it is a compound feature) and fuzzy locations. As a result, the length of the retrieved sequence may not exactly match the length indicated from the ‘start’ and ‘stop’ coordinates. If the nucleotide sequence ‘value’ is provided, the ‘parent_genome_seq’ does not impact the result.

Parameters
  • value (str of Seq) – Input nucleotide sequence

  • parent_genome_seq (Seq) – Input parent genome nucleotide sequence.

set_num(attr, description, delimiter=None, prefix_set=None)

Set a number attribute from a description.

Parameters
  • attr (str) – Attribute to set the number.

  • description (str) – Description data from which to parse the number.

  • delimiter (str) – Value used to split the description data.

  • prefix_set (set) – Valid possible delimiters in the description.

set_orientation(value, format, case=False)

Sets orientation based on indicated format.

Relies on the reformat_strand function to manage orientation data.

Parameters
  • value (misc.) – Input orientation value.

  • format (str) – Indicates how the orientation data should be formatted.

  • case (bool) – Indicates whether the output orientation data should be cased.

set_seqfeature(type='CDS')

Set the ‘seqfeature’ attribute.

The ‘seqfeature’ attribute stores a Biopython SeqFeature object, which contains methods valuable to extracting sequence data relevant to the feature.

set_translation(value=None, translate=False)

Set translation and its length.

The translation is coerced into a Biopython Seq object. If no input translation value is provided, the translation is generated from the parent genome nucleotide sequence. If an input translation value is provided, the ‘translate’ parameter has no impact.

Parameters
  • value (str or Seq) – Amino acid sequence

  • translate (bool) – Indicates whether the translation should be generated from the parent genome nucleotide sequence.

set_translation_table(value)

Set translation table integer.

Parameters

value (int) – Translation table that should be used to generate the translation.

translate_seq()

Translate the CDS nucleotide sequence.

Use Biopython to translate the nucleotide sequece. The method expects the nucleotide sequence to be a valid CDS sequence in which:

  1. it begins with a valid start codon,

  2. it ends with a stop codon,

  3. it contains only one stop codon,

  4. its length is divisible by 3,

  5. it translates non-standard start codons to methionine.

If these criteria are not met, an empty Seq object is returned.

Returns

Amino acid sequence

Return type

Seq

pdm_utils.classes.cdspair module

Represents a structure to directly compare data between two or more CDS features.

class pdm_utils.classes.cdspair.CdsPair

Bases: object

compare_cds()

pdm_utils.classes.dbcomparesummary module

Represents a collection of data about how databases storing the same data differ from each other.

class pdm_utils.classes.dbcomparesummary.DbCompareSummary(matched_genomes_list)

Bases: object

compute_gbk_gnm_summary(gnm)

Check errors within GenBank genome.

compute_matched_genomes_summary(gnms, gnm_mysql, gnm_pdb, gnm_gbk)

Check errors within matched genomes.

compute_mysql_gbk_summary(gnms)

Check differences between MySQL and GenBank genomes.

compute_mysql_gnm_summary(gnm)

Check errors within MySQL genome.

compute_mysql_pdb_summary(gnms)

Check differences between MySQL and PhagesDB genomes.

compute_pdb_gbk_summary(gnms)

Check differences between PhagesDB and GenBank genomes.

compute_pdb_gnm_summary(gnm)

Check errors within PhagesDB genome.

compute_summary(gnm_mysql, gnm_pdb, gnm_gbk)

pdm_utils.classes.evaluation module

Represents a structure to contain results of an evaluation.

class pdm_utils.classes.evaluation.Evaluation(id='', definition='', result='', status='')

Bases: object

pdm_utils.classes.fileio module

class pdm_utils.classes.fileio.FeatureTableParser(filehandle)

Bases: object

Class to act as a generator for reading (five-column) feature tables and retrieving Biopython SeqRecord objects.

next()
pdm_utils.classes.fileio.convert_tbl_data_to_record(tbl_data)

Converts string lines from a five_column feature table to a seqrecord.

Parameters

tbl_data (list) – A list of the lines of data from a feature table file.

Returns

Returns a Biopython SeqRecord object loaded with given data.

Return type

SeqRecord

pdm_utils.classes.fileio.feature_data_to_seqfeature(coordinates, feature_type)

Converts coordinates received from feature table parsing to a SeqFeature.

Parameters
  • coordinates (list[tuple(int, int)]) – Start and end positions for the feature.

  • feature_type – Label of the parsed feature.

:type str :returns: Returns a Biopython SeqFeature loaded with given coordinates. :rtype: SeqFeature

pdm_utils.classes.fileio.parse_tbl_data_type(data_line)

Parses a five-column table data line and returns it’s structure type.

Parameters

data_line (str) – Line of data from a five-column feature table.

Returns

Returns the type of data line the line is from the table.

Return type

str

pdm_utils.classes.filter module

Object to provide a formatted filtering query for retrieving data from a SQL database.

class pdm_utils.classes.filter.Filter(alchemist=None, key=None)

Bases: object

add(filter_string)

Add a MySQL where filter(s) to the Filter object class.

Parameters

filter (str) – Formatted MySQL WHERE clause.

and_(filter)

Add an and conditional to the Filter object class.

Param_filter

Formatted MySQL WHERE clause.

Type_filter

str

build_values(where=None, column=None, raw_bytes=False, limit=8000)

Queries for values from stored WHERE clauses and Filter key.

Parameters
  • where (list) – MySQL WHERE clause_related SQLAlchemy object(s).

  • order_by (list) – MySQL ORDER BY clause-related SQLAlchemy object(s).

  • column (str) – SQLAlchemy Column object or object name.

  • limit (int) – SQLAlchemy IN clause query length limiter.

Returns

Distinct values fetched from given and innate constraints.

Return type

list

build_where_clauses()

Builds BinaryExpression objects from stored Filter object filters.

Returns

A list of SQLAlchemy WHERE conditionals.

Return type

list

check()

Check Filter object contains valid essential objects. Filter object requires a SQLAlchemy Engine and Column as well as a NetworkX Graph.

connect(alchemist=None)

Connect Filter object to a database with an AlchemyHandler.

Parameters

alchemist (AlchemyHandler) – An AlchemyHandler object.

property connected
copy()

Returns a copy of a Filter object.

copy_filters()

Returns a copy of a Filter object’s filter dictionary.

property engine
property filters
get_column(raw_column)

Converts a column input, string or Column, to a Column.

Parameters

raw_column (str) – SQLAlchemy Column object or object name.

get_columns(raw_columns)

Converts a column input list, string or Column, to a list of Columns.

Parameters

raw_column (list[str]) – SQLAlchemy Column object or object name.

Returns

Returns SQLAlchemy Columns

Return type

list[Column]

property graph
group(raw_column, raw_bytes=False, filter=False)

Queries and separates Filter object’s values based on a Column.

Parameters

raw_column (str) – SQLAlchemy Column object or object name.

hits()

Gets the number of a Filter object’s values.

property key

Connect Filter object to a database with an existing AlchemyHandler.

Parameters

alchemist (AlchemyHandler) – An AlchemyHandler object.

property mapper
mass_transpose(raw_columns, raw_bytes=False, filter=False)

Queries for sets of distinct values, using self.transpose()

Parameters

columns (list) – SQLAlchemy Column object(s)

Returns

Distinct values fetched from given and innate restraints.

Return type

dict

new_or_()

Create a new conditional block to the Filter object class.

property or_index
parenthesize()

Condense current filters into an isolated clause

print_results()

Prints the Filter object’s values in a formatted way.

query(table_map)

Queries for ORM object instances conditioned on Filter values.

Parameters

table_map (DeclarativeMeta) – SQLAlchemy ORM map object.

Returns

List of mapped object instances.

Return type

list

refresh()

Re-queries for the Filter’s values.

remove(filter)

Remove an and filter from the current block of and conditionals.

Parameters

filter (str) – Formatted MySQL WHERE clause.

reset()

Resets all filters, values, and Filter state conditions.

reset_filters()

Resets all filters and relevant Filter state condition.

retrieve(raw_columns, raw_bytes=False, filter=False)

Queries for distinct data for each value in the Filter object.

Parameters

columns (list[str]) – SQLAlchemy Column object(s)

Returns

Distinct values for each Filter value.

Return type

dict{dict}

select(raw_columns, return_dict=True)

Queries for data conditioned on the values in the Filter object.

Parameters
  • columns (list[str]) – SQLAlchemy Column object(s)

  • return_dict (Boolean) – Toggle whether to return data as a dictionary.

Returns

SELECT data conditioned on the values in the Filter object.

Return type

dict

Return type

list[RowProxy]

property session
sort(raw_columns)

Re-queries for the Filter’s values, applying a ORDER BY clause.

Parameters

raw_column – SQLAlchemy Column object(s) or object name(s).

transpose(raw_column, return_dict=False, set_values=False, raw_bytes=False, filter=False)

Queries for distinct values from stored values and a MySQL Column.

Parameters
  • raw_column (str) – SQLAlchemy Column object or object name.

  • return_dict (Boolean) – Toggle whether to return data as a dictionary.

  • set_values (Boolean) – Toggle whether to replace Filter key and values.

Returns

Distinct values fetched from given and innate constraints.

Return type

list

Return type

dict

update()

Queries using the Filter’s key and its stored BinaryExpressions.

property updated
property values
property values_valid

pdm_utils.classes.genome module

Represents a collection of data about a genome that are commonly used to maintain and update SEA-PHAGES phage genomics data.

class pdm_utils.classes.genome.Genome

Bases: object

Class to hold data about a phage genome.

check_attribute(attribute, check_set, expect=False, eval_id=None, success='correct', fail='error', eval_def=None)

Check that the attribute value is valid.

Parameters
  • attribute (str) – Name of the Genome object attribute to evaluate.

  • check_set (set) – Set of reference ids.

  • expect (bool) – Indicates whether the attribute value is expected to be present in the check set.

  • eval_id (str) – Unique identifier for the evaluation.

  • success (str) – Default status if the outcome is a success.

  • fail (str) – Default status if the outcome is not a success.

  • eval_def (str) – Description of the evaluation.

check_authors(check_set={}, expect=True, eval_id=None, success='correct', fail='error', eval_def=None)

Check author list.

Evaluates whether at least one author in the in the list of authors is present in a set of reference authors.

Parameters
  • check_set (set) – Set of reference authors.

  • expect (bool) – Indicates whether at least one author in the list of authors is expected to be present in the check set.

  • eval_id – same as for check_attribute().

  • success – same as for check_attribute().

  • fail – same as for check_attribute().

  • eval_def – same as for check_attribute().

check_cds_end_orient_ids(eval_id=None, success='correct', fail='error', eval_def=None)

Check if there are any duplicate transcription end-orientation coordinates.

Duplicated transcription end-orientation coordinates may represent unintentional duplicate CDS features with slightly different start coordinates.

Parameters
  • eval_id – same as for check_attribute().

  • success – same as for check_attribute().

  • fail – same as for check_attribute().

  • eval_def – same as for check_attribute().

check_cds_start_end_ids(eval_id=None, success='correct', fail='error', eval_def=None)

Check if there are any duplicate start-end coordinates.

Duplicated start-end coordinates may represent unintentional duplicate CDS features.

Parameters
  • eval_id – same as for check_attribute().

  • success – same as for check_attribute().

  • fail – same as for check_attribute().

  • eval_def – same as for check_attribute().

check_cluster_structure(eval_id=None, success='correct', fail='error', eval_def=None)

Check whether the cluster attribute is structured appropriately.

Parameters
  • eval_id – same as for check_attribute().

  • success – same as for check_attribute().

  • fail – same as for check_attribute().

  • eval_def – same as for check_attribute().

check_compatible_cluster_and_subcluster(eval_id=None, success='correct', fail='error', eval_def=None)

Check compatibility of cluster and subcluster attributes.

Parameters
  • eval_id – same as for check_attribute().

  • success – same as for check_attribute().

  • fail – same as for check_attribute().

  • eval_def – same as for check_attribute().

check_feature_coordinates(use_cds=False, use_trna=False, use_tmrna=False, other=None, strand=False, eval_id=None, success='correct', fail='error', eval_def=None)

Identify nested, duplicated, or partially-duplicated features.

Parameters
  • use_cds (bool) – Indicates whether ids for CDS features should be generated.

  • use_trna (bool) – Indicates whether ids for tRNA features should be generated.

  • use_tmrna (bool) – Indicates whether ids for tmRNA features should be generated.

  • other (list) – List of features that should be included.

  • strand (bool) – Indicates if feature orientation should be included.

  • eval_id – same as for check_attribute().

  • success – same as for check_attribute().

  • fail – same as for check_attribute().

  • eval_def – same as for check_attribute().

check_magnitude(attribute, expect, ref_value, eval_id=None, success='correct', fail='error', eval_def=None)

Check that the magnitude of a numerical attribute is valid.

Parameters
  • attribute – same as for check_attribute().

  • expect (str) – Comparison symbol indicating direction of magnitude (>, =, <).

  • ref_value (int, float, datetime) – Numerical value for comparison.

  • eval_id – same as for check_attribute().

  • success – same as for check_attribute().

  • fail – same as for check_attribute().

  • eval_def – same as for check_attribute().

check_nucleotides(check_set={}, eval_id=None, success='correct', fail='error', eval_def=None)

Check if all nucleotides in the sequence are expected.

Parameters
  • check_set (set) – Set of reference nucleotides.

  • eval_id – same as for check_attribute().

  • success – same as for check_attribute().

  • fail – same as for check_attribute().

  • eval_def – same as for check_attribute().

check_subcluster_structure(eval_id=None, success='correct', fail='error', eval_def=None)

Check whether the subcluster attribute is structured appropriately.

Parameters
  • eval_id – same as for check_attribute().

  • success – same as for check_attribute().

  • fail – same as for check_attribute().

  • eval_def – same as for check_attribute().

clear_locus_tags()

Resets locus_tags to empty string.

compare_two_attributes(attribute1, attribute2, expect_same=False, eval_id=None, success='correct', fail='error', eval_def=None)

Determine if two attributes are the same.

Parameters
  • attribute1 (str) – First attribute to compare.

  • attribute2 (str) – Second attribute to compare.

  • expect_same (bool) – Indicates whether the two attribute values are expected to be the same.

  • eval_id – same as for check_attribute().

  • success – same as for check_attribute().

  • fail – same as for check_attribute().

  • eval_def – same as for check_attribute().

parse_description()

Retrieve the name and host_genus from the ‘description’ attribute.

parse_organism()

Retrieve the name and host_genus from the ‘organism’ attribute.

parse_source()

Retrieve the name and host_genus from the ‘source’ attribute.

set_accession(value, format='empty_string')

Set the accession.

The Accession field in the MySQL database defaults to ‘’. Some flat file accessions have the version number suffix, so discard the version number.

Parameters
  • value (str) – GenBank accession number.

  • format (misc.) – indicates the format of the data if it is not a valid accession. Default is ‘’.

set_annotation_author(value)

Convert annotation_author to integer value if possible.

Parameters

value (str, int) – Numeric value.

set_cds_descriptions(value)

Set each CDS processed description as indicated.

Parameters

value (str) – Name of the description field.

set_cds_features(value)

Set and tally the CDS features.

Parameters

value (list) – list of Cds objects.

set_cds_id_list()

Creates lists of CDS feature identifiers.

The first identifier is derived from the start and end coordinates. The second identifier is derived from the transcription end coordinate and orientation.

set_cluster(value)

Set the cluster and modify singleton if needed.

Parameters

value (str) – Cluster designation of the genome.

set_date(value, format='empty_datetime_obj')

Set the date attribute.

Parameters
  • value (misc) – Date

  • format (str) – Indicates the format if the value is empty.

set_eval(eval_id, definition, result, status)

Constructs and adds an Evaluation object to the evaluations list.

Parameters
  • eval_id (str) – Unique identifier for the evaluation.

  • definition (str) – Description of the evaluation.

  • result (str) – Description of the outcome of the evaluation.

  • status (str) – Outcome of the evaluation.

set_feature_genome_ids(use_cds=False, use_trna=False, use_tmrna=False, use_source=False, value=None)

Sets the genome_id of each feature.

Parameters
  • use_cds (bool) – Indicates whether genome_id for CDS features should be set.

  • use_trna (bool) – Indicates whether genome_id for tRNA features should be set.

  • use_tmrna (bool) – Indicates whether genome_id for tmRNA features should be set.

  • use_source (bool) – Indicates whether genome_id for source features should be set.

  • value (str) – Genome identifier.

set_feature_ids(use_type=False, use_cds=False, use_trna=False, use_tmrna=False, use_source=False)

Sets the id of each feature.

Lists of features can be added to this method. The method assumes that all elements in all lists contain ‘id’, ‘start’, and ‘stop’ attributes. This feature attribute is processed within the Genome object because and not within the feature itself since the method sorts all features and generates systematic IDs based on feature order in the genome.

Parameters
  • use_type (bool) – Indicates whether the type of object should be added to the feature id.

  • use_cds (bool) – Indicates whether ids for CDS features should be generated.

  • use_trna (bool) – Indicates whether ids for tRNA features should be generated.

  • use_tmrna (bool) – Indicates whether ids for tmRNA features should be generated.

  • use_source (bool) – Indicates whether ids for source features should be generated.

set_filename(filepath)

Set the filename. Discard the path and file extension.

Parameters

filepath (Path) – name of the file reference.

set_host_genus(value=None, attribute=None, format='empty_string')

Set the host_genus from a value parsed from the indicated attribute.

The input data is split into multiple parts, and the first word is used to set host_genus.

Parameters
  • value (str) – the host genus of the phage genome

  • attribute (str) – the name of the genome attribute from which the host_genus attribute will be set

  • format (str) – the default format if the input is an empty/null value.

set_id(value=None, attribute=None)

Set the id from either an input value or an indicated attribute.

Parameters
  • value (str) – unique identifier for the genome.

  • attribute (str) – name of a genome object attribute that stores a unique identifier for the genome.

set_retrieve_record(value)

Convert retrieve_record to integer value if possible.

Parameters

value (str, int) – Numeric value.

set_sequence(value)

Set the nucleotide sequence and compute the length.

This method coerces sequences into a Biopython Seq object.

Parameters

value (str or Seq) – the genome’s nucleotide sequence.

set_source_features(value)

Set and tally the source features.

Parameters

value (list) – list of Source objects.

set_subcluster(value)

Set the subcluster.

Parameters

value (str) – Subcluster designation of the genome.

set_tmrna_features(value)

Set and tally the tmRNA features. :param value: list of Tmrna objects. :type value: list

set_trna_features(value)

Set and tally the tRNA features.

Parameters

value (list) – list of Trna objects.

set_unique_cds_end_orient_ids()

Identify CDS features contain unique transcription end-orientation coordinates.

set_unique_cds_start_end_ids()

Identify CDS features contain unique start-end coordinates.

tally_cds_descriptions()

Tally the non-generic CDS descriptions.

update_name_and_id(value)

Update the genome name and id in all locations in a Genome object.

Parameters
  • gnm (Genome) – A pdm_utils Genome object.

  • value (str) – Value used to update the Genome id and name.

pdm_utils.classes.genomepair module

Represents a structure to pair two Genome objects and perform comparisons between them to identify inconsistencies.

class pdm_utils.classes.genomepair.GenomePair

Bases: object

compare_attribute(attribute, expect_same=False, eval_id=None, success='correct', fail='error', eval_def=None)

Compare values of the specified attribute in each genome.

Parameters
  • attribute (str) – Name of the GenomePair object attribute to evaluate.

  • expect_same (bool) – Indicates whether the two attribute values are expected to be the same.

  • eval_id (str) – Unique identifier for the evaluation.

  • success (str) – Default status if the outcome is a success.

  • fail (str) – Default status if the outcome is not a success.

  • eval_def (str) – Description of the evaluation.

compare_date(expect, eval_id=None, success='correct', fail='error', eval_def=None)

Compare the date of each genome.

Parameters
  • expect (str) – Is the first genome expected to be “newer”, “equal”, or “older” than the second genome.

  • eval_id – same as for compare_attribute().

  • success – same as for compare_attribute().

  • fail – same as for compare_attribute().

  • eval_def – same as for compare_attribute().

set_eval(eval_id, definition, result, status)

Constructs and adds an Evaluation object to the evaluations list.

Parameters
  • eval_id (str) – Unique identifier for the evaluation.

  • definition (str) – Description of the evaluation.

  • result (str) – Description of the outcome of the evaluation.

  • status (str) – Outcome of the evaluation.

pdm_utils.classes.genometriad module

Represents a structure to group three Genome objects and perform comparisons between them to identify inconsistencies.

class pdm_utils.classes.genometriad.GenomeTriad

Bases: object

Stores three Genome objects.

compare_mysql_gbk_genomes(gnm_mysql, gnm_gbk, cdspair_mysql_gbk)
compare_mysql_phagesdb_genomes(gnm_mysql, gnm_pdb)
compare_phagesdb_gbk_genomes(gnm_pdb, gnm_gbk)
compute_total_genome_errors(gnm_gbk, gnm_pdb)

pdm_utils.classes.randomfieldupdatehandler module

class pdm_utils.classes.randomfieldupdatehandler.RandomFieldUpdateHandler(connection)

Bases: object

execute_ticket()

This function checks whether the ticket is valid. If it is not valid, the function returns with code 0, indicating failure to execute the ticket. If the ticket is valid, request input from the user to verify that they actually want to proceed with the update they’ve proposed. If response is in the affirmative, the ticket is executed. Otherwise, indicate that this ticket will be skipped, and return 0 as the ticket was not executed. If an error is encountered during execution of the ticket, print error message and return 0. If the ticket is executed without issue, return 1 indicating success. :return:

validate_field()

This function attempts to validate the replacement field by checking whether it’s on the list of fields in the indicated table. :return:

validate_key_name()

This function attempts to validate the selection key by checking whether it’s a field in the table marked as any kind of key. :return:

validate_key_value()

This function attempts to validate the selection key’s value by querying the database for the data associated with that key and value on the indicated table :return:

validate_table()

This function attempts to validate the table by simply querying for the table’s description. :return:

validate_ticket()

This function runs all 4 of the object’s built-in ticket validation methods, and checks whether any of the ticket inputs were invalid. If any are invalid, reject the ticket. If none are invalid, accept the ticket. :return:

pdm_utils.classes.source module

Represents a collection of data about a Source feature that is commonly used to maintain and update SEA-PHAGES phage genomics data.

class pdm_utils.classes.source.Source

Bases: object

check_attribute(attribute, check_set, expect=False, eval_id=None, success='correct', fail='error', eval_def=None)

Check that the attribute value is valid.

Parameters
  • attribute (str) – Name of the Source feature object attribute to evaluate.

  • check_set (set) – Set of reference ids.

  • expect (bool) – Indicates whether the attribute value is expected to be present in the check set.

  • eval_id (str) – Unique identifier for the evaluation.

  • success (str) – Default status if the outcome is a success.

  • fail (str) – Default status if the outcome is not a success.

  • eval_def (str) – Description of the evaluation.

parse_host()

Retrieve the host_genus name from the ‘host’ field.

parse_lab_host()

Retrieve the host_genus name from the ‘lab_host’ field.

parse_organism()

Retrieve the phage and host_genus names from the ‘organism’ field.

set_eval(eval_id, definition, result, status)

Constructs and adds an Evaluation object to the evaluations list.

Parameters
  • eval_id (str) – Unique identifier for the evaluation.

  • definition (str) – Description of the evaluation.

  • result (str) – Description of the outcome of the evaluation.

  • status (str) – Outcome of the evaluation.

pdm_utils.classes.ticket module

Represents a structure to contain directions for how to parse and import genomes into a MySQL database.

class pdm_utils.classes.ticket.ImportTicket

Bases: object

check_attribute(attribute, check_set, expect=False, eval_id=None, success='correct', fail='error', eval_def=None)

Check that the id is valid.

Parameters
  • attribute (str) – Name of the ImportTicket object attribute to evaluate.

  • check_set (set) – Set of reference ids.

  • expect (bool) – Indicates whether the attribute value is expected to be present in the check set.

  • eval_id (str) – Unique identifier for the evaluation.

  • success (str) – Default status if the outcome is a success.

  • fail (str) – Default status if the outcome is not a success.

  • eval_def (str) – Description of the evaluation.

check_compatible_type_and_data_retain(eval_id=None, success='correct', fail='error', eval_def=None)

Check if the ticket type and data_retain are compatible.

If the ticket type is ‘add’, then the data_retain set is not expected to have any data.

Parameters
  • eval_id – same as for check_attribute().

  • success – same as for check_attribute().

  • fail – same as for check_attribute().

  • eval_def – same as for check_attribute().

check_eval_flags(expect=True, eval_id=None, success='correct', fail='error', eval_def=None)

Check that the eval_flags is valid.

Parameters
  • expect (bool) – Indicates whether the eval_flags is expected to contain data.

  • eval_id – same as for check_attribute().

  • success – same as for check_attribute().

  • fail – same as for check_attribute().

  • eval_def – same as for check_attribute().

check_valid_data_source(ref_set_attr, check_set, eval_id=None, success='correct', fail='error', eval_def=None)

Check that the values in the specified attribute are valid.

Parameters
  • ref_set_attr (str) – Name of the data_dict in the ticket to be evaluated (data_add, data_retain, data_retrieve, data_parse)

  • check_set (set) – Set of valid field names.

  • eval_id – same as for check_attribute().

  • success – same as for check_attribute().

  • fail – same as for check_attribute().

  • eval_def – same as for check_attribute().

set_description_field(value)

Set the description_field.

Parameters

value (str) – Value to be set as the description_field.

set_eval(eval_id, definition, result, status)

Constructs and adds an Evaluation object to the evaluations list.

Parameters
  • eval_id (str) – Unique identifier for the evaluation.

  • definition (str) – Description of the evaluation.

  • result (str) – Description of the outcome of the evaluation.

  • status (str) – Outcome of the evaluation.

set_eval_mode(value)

Set the eval_mode.

Parameters

value (str) – Value to be set as the eval_mode.

set_type(value)

Set the ticket type.

Parameters

value (str) – Value to be set as the type.

pdm_utils.classes.tmrna module

Represents a collection of data about a tmRNA feature that are commonly used to maintain and update SEA-PHAGES phage genomics data.

class pdm_utils.classes.tmrna.Tmrna

Bases: object

check_attribute(attribute, check_set, expect=False, eval_id=None, success='correct', fail='error', eval_def=None)

Checks whether the indicated feature attribute is present in the given check_set. Uses expect to determine whether the presence (or lack thereof) is an error, or correct. :param attribute: the gene feature attribute to evaluate :type attribute: str :param check_set: set of reverence values :type check_set: set :param expect: whether the attribute’s value is expected to be in the reference set :type expect: bool :param eval_id: unique identifier for the evaluation :type eval_id: str :param success: status if the outcome is successful :type success: str :param fail: status if the outcome is unsuccessful :type fail: str :param eval_def: description of the evaluation :type eval_def: str :return:

check_compatible_gene_and_locus_tag(eval_id=None, success='correct', fail='error', eval_def=None)

Check that gene and locus_tag attributes contain identical numbers :param eval_id: unique identifier for the evaluation :type eval_id: str :param success: status if the outcome is successful :type success: str :param fail: status if the outcome is unsuccessful :type fail: str :param eval_def: description of the evaluation :type eval_def: str :return:

check_gene_structure(eval_id=None, success='correct', fail='error', eval_def=None)

Check that the gene qualifier contains an integer. :param eval_id: unique identifier for the evaluation :type eval_id: str :param success: status if the outcome is successful :type success: str :param fail: status if the outcome is unsuccessful :type fail: str :param eval_def: description of the evaluation :type eval_def: str :return:

check_locus_tag_structure(check_value=None, only_typo=False, prefix_set={}, case=True, eval_id=None, success='correct', fail='error', eval_def=None)

Check if the locus_tag is structured correctly.

Parameters
  • check_value (str) – Indicates the genome id that is expected to be present. If None, the ‘genome_id’ parameter is used.

  • only_typo (bool) – Indicates if only the genome id spelling should be evaluated.

  • prefix_set (set) – Indicates valid common prefixes, if a prefix is expected.

  • case (bool) – Indicates whether the locus_tag is expected to be capitalized.

  • eval_id – same as for check_attribute().

  • success – same as for check_attribute().

  • fail – same as for check_attribute().

  • eval_def – same as for check_attribute().

check_magnitude(attribute, expect, ref_value, eval_id=None, success='correct', fail='error', eval_def=None)

Check that the magnitude of a numerical attribute meets expectations. :param attribute: the gene feature attribute to evaluate :type attribute: str :param expect: symbol designating direction of magnitude (>=<) :type expect: str :param ref_value: numerical value for comparison :type ref_value: int, float, datetime :param eval_id: unique identifier for the evaluation :type eval_id: str :param success: status if the outcome is successful :type success: str :param fail: status if the outcome is unsuccessful :type fail: str :param eval_def: description of the evaluation :type eval_def: str :return:

check_orientation(fmt='fr_short', case=True, eval_id=None, success='correct', fail='error', eval_def=None)

Check that the orientation is set appropriately. :param fmt: indicates how coordinates should be formatted :type fmt: str :param case: indicates whether orientation data should be cased :type case: bool :param eval_id: unique identifier for the evaluation :type eval_id: str :param success: status if the outcome is successful :type success: str :param fail: status if the outcome is unsuccessful :type fail: str :param eval_def: description of the evaluation :type eval_def: str :return:

check_orientation_correct(fmt='fr_short', case=True, eval_id=None, success='correct', fail='error', eval_def=None)

Check that the orientation agrees with the Aragorn and/or tRNAscan-SE predicted orientation. If Aragorn/tRNAscan-SE report a forward orientation, it means they agree with the annotated orientation. If they report reverse orientation, they think the annotation is backwards. :param fmt: indicates how coordinates should be formatted :type fmt: str :param case: indicates whether orientation data should be cased :type case: bool :param eval_id: unique identifier for the evaluation :type eval_id: str :param success: status if the outcome is successful :type success: str :param fail: status if the outcome is unsuccessful :type fail: str :param eval_def: description of the evaluation :type eval_def: str :return:

check_parts(eval_id=None, success='correct', fail='error', eval_def=None)

Makes sure only one region exists for this tRNA. :param eval_id: unique identifier for the evaluation :type eval_id: str :param success: status if the outcome is successful :type success: str :param fail: status if the outcome is unsuccessful :type fail: str :param eval_def: description of the evaluation :type eval_def: str :return:

check_peptide_tag_correct(eval_id=None, success='correct', fail='error', eval_def=None)

Checks whether the annotated peptide tag matches the Aragorn output. :param eval_id: unique identifier for the evaluation :type eval_id: str :param success: status if the outcome is successful :type success: str :param fail: status if the outcome is unsuccessful :type fail: str :param eval_def: description of the evaluation :type eval_def: str :return:

check_peptide_tag_valid(eval_id=None, success='correct', fail='error', eval_def=None)

Checks whether the annotated peptide tag contains any letters not strictly within the protein alphabet. :param eval_id: unique identifier for the evaluation :type eval_id: str :param success: status if the outcome is successful :type success: str :param fail: status if the outcome is unsuccessful :type fail: str :param eval_def: description of the evaluation :type eval_def: str :return:

get_begin_end()

Accesses feature coordinates in transcription begin-end format. :return: (begin, end)

get_qualifiers()

Helper function that uses tRNA data to populate the qualifiers attribute of seqfeature. :return: qualifiers OrderedDict()

parse_peptide_tag()

Parse the peptide_tag attribute out of the note field. :return:

reformat_start_and_stop(fmt)

Convert existing start and stop coordinates to the indicated new format; also updates the coordinate format attribute to reflect any change. :param fmt: the new desired coordinate format :type fmt: str :return:

run_aragorn()

Uses an AragornHandler object to negotiate the flow of information between this object and Aragorn. :return:

set_eval(eval_id, definition, result, status)

Constructs and adds and Evaluation object to this feature’s list of evaluations. :param eval_id: unique identifier for the evaluation :type eval_id: str :param definition: description of the evaluation :type definition: str :param result: description of the evaluation outcome :type result: str :param status: overall outcome of the evaluation :type status: str :return:

set_gene(value, delimiter=None, prefix_set=None)

Set the gene attribute.

Parameters
  • value (str) – Gene data to parse. Also passed to set_num().

  • delimiter (str) – Passed to set_num().

  • prefix_set (set) – Passed to set_num().

set_location_id()

Create identifier tuples containing feature location data. For this method we only care about gene boundaries and will ignore any multi-part elements to the gene. :return:

set_locus_tag(tag='', delimiter='_', check_value=None)

Populate the locus_tag and parse the locus_tag number. :param tag: Input locus_tag data :type tag: str :param delimiter: Value used to split locus_tag data :type delimiter: str :param check_value: Genome name or other value that will be used to parse the locus_tag to identify the feature number :type check_value: str

set_name(value=None)

Set the feature name. Ideally, the name of the CDS will be an integer. This information can be stored in multiple fields in the GenBank-formatted flat file. The name is derived from one of several qualifiers. :param value: Indicates a value that should be used to directly set the name regardless of the ‘gene’ and ‘_locus_tag_num’ attributes. :type value: str

set_nucleotide_length(use_seq=False)

Set the nucleotide length of this gene feature. :param use_seq: whether to use the Seq feature to calculate nucleotide length of this feature :type use_seq: bool :return:

set_nucleotide_sequence(value=None, parent_genome_seq=None)

Set this feature’s nucleotide sequence :param value: sequence :type value: str or Seq :param parent_genome_seq: parent genome sequence :type parent_genome_seq: Seq :raise: ValueError :return:

set_num(attr, description, delimiter=None, prefix_set=None)

Set a number attribute from a description. :param attr: Attribute to set the number. :type attr: str :param description: Description data from which to parse the number. :type description: str :param delimiter: Value used to split the description data. :type delimiter: str :param prefix_set: Valid possible delimiters in the description. :type prefix_set: set

set_orientation(value, fmt, case=False)

Set the orientation based on the indicated format. :param value: orientation value :type value: int or str :param fmt: how orientation should be formatted :type fmt: str :param case: whether to capitalize the first letter of orientation :type case: bool :return:

set_seqfeature()

Create a SeqFeature object with which to populate the seqfeature attribute. :return:

pdm_utils.classes.trna module

Represents a collection of data about a tRNA feature that are commonly used to maintain and update SEA-PHAGES phage genomics data.

class pdm_utils.classes.trna.Trna

Bases: object

check_amino_acid_correct(eval_id=None, success='correct', fail='error', eval_def=None)

Checks that the amino acid that has been annotated for this tRNA agrees with the Aragorn and/or tRNAscan-SE prediction(s). :param eval_id: unique identifier for the evaluation :type eval_id: str :param success: status if the outcome is successful :type success: str :param fail: status if the outcome is unsuccessful :type fail: str :param eval_def: description of the evaluation :type eval_def: str :return:

check_amino_acid_valid(eval_id=None, success='correct', fail='error', eval_def=None)

Checks that the amino acid that has been annotated for this tRNA is in the set of amino acids that we have opted to allow in the MySQL database. :param eval_id: unique identifier for the evaluation :type eval_id: str :param success: status if the outcome is successful :type success: str :param fail: status if the outcome is unsuccessful :type fail: str :param eval_def: description of the evaluation :type eval_def: str :return:

check_anticodon_correct(eval_id=None, success='correct', fail='error', eval_def=None)

Checks that the annotated anticodon agrees with the prediction by Aragorn or tRNAscan-SE. :param eval_id: unique identifier for the evaluation :type eval_id: str :param success: status if the outcome is successful :type success: str :param fail: status if the outcome is unsuccessful :type fail: str :param eval_def: description of the evaluation :type eval_def: str :return:

check_anticodon_valid(eval_id=None, success='correct', fail='error', eval_def=None)

Checks that the anticodon conforms to the expected length (2-4) and alphabet (“a”, “c”, “g”, “t”) or is “nnn”. :param eval_id: unique identifier for the evaluation :type eval_id: str :param success: status if the outcome is successful :type success: str :param fail: status if the outcome is unsuccessful :type fail: str :param eval_def: description of the evaluation :type eval_def: str :return:

check_attribute(attribute, check_set, expect=False, eval_id=None, success='correct', fail='error', eval_def=None)

Checks whether the indicated feature attribute is present in the given check_set. Uses expect to determine whether the presence (or lack thereof) is an error, or correct. :param attribute: the gene feature attribute to evaluate :type attribute: str :param check_set: set of reverence values :type check_set: set :param expect: whether the attribute’s value is expected to be in the reference set :type expect: bool :param eval_id: unique identifier for the evaluation :type eval_id: str :param success: status if the outcome is successful :type success: str :param fail: status if the outcome is unsuccessful :type fail: str :param eval_def: description of the evaluation :type eval_def: str :return:

check_compatible_gene_and_locus_tag(eval_id=None, success='correct', fail='error', eval_def=None)

Check that gene and locus_tag attributes contain identical numbers. :param eval_id: unique identifier for the evaluation :type eval_id: str :param success: status if the outcome is successful :type success: str :param fail: status if the outcome is unsuccessful :type fail: str :param eval_def: description of the evaluation :type eval_def: str :return:

check_coordinates(eval_id=None, success='correct', fail='error', eval_def=None)
Parameters
  • eval_id (str) – unique identifier for the evaluation

  • success (str) – status if the outcome is successful

  • fail (str) – status if the outcome is unsuccessful

  • eval_def (str) – description of the evaluation

Returns

check_gene_structure(eval_id=None, success='correct', fail='error', eval_def=None)

Check that the gene qualifier contains an integer. :param eval_id: unique identifier for the evaluation :type eval_id: str :param success: status if the outcome is successful :type success: str :param fail: status if the outcome is unsuccessful :type fail: str :param eval_def: description of the evaluation :type eval_def: str :return:

check_length(eval_id=None, success='correct', fail='error', eval_def=None)

Checks that the tRNA is in the expected range of lengths. The average tRNA gene is 70-90bp in length, but it is not uncommon to identify well-scoring tRNAs in the 60-100bp range. :param eval_id: unique identifier for the evaluation :type eval_id: str :param success: status if the outcome is successful :type success: str :param fail: status if the outcome is unsuccessful :type fail: str :param eval_def: description of the evaluation :type eval_def: str :return:

check_locus_tag_structure(check_value=None, only_typo=False, prefix_set={}, case=True, eval_id=None, success='correct', fail='error', eval_def=None)

Check if the locus_tag is structured correctly.

Parameters
  • check_value (str) – Indicates the genome id that is expected to be present. If None, the ‘genome_id’ parameter is used.

  • only_typo (bool) – Indicates if only the genome id spelling should be evaluated.

  • prefix_set (set) – Indicates valid common prefixes, if a prefix is expected.

  • case (bool) – Indicates whether the locus_tag is expected to be capitalized.

  • eval_id – same as for check_attribute().

  • success – same as for check_attribute().

  • fail – same as for check_attribute().

  • eval_def – same as for check_attribute().

check_magnitude(attribute, expect, ref_value, eval_id=None, success='correct', fail='error', eval_def=None)

Check that the magnitude of a numerical attribute meets expectations. :param attribute: the gene feature attribute to evaluate :type attribute: str :param expect: symbol designating direction of magnitude (>=<) :type expect: str :param ref_value: numerical value for comparison :type ref_value: int, float, datetime :param eval_id: unique identifier for the evaluation :type eval_id: str :param success: status if the outcome is successful :type success: str :param fail: status if the outcome is unsuccessful :type fail: str :param eval_def: description of the evaluation :type eval_def: str :return:

check_note_structure(eval_id=None, success='correct', fail='error', eval_def=None)

Checks that the note field is formatted properly.

Genbank does not enforce any standard for the note field. This means that a note does not have to exist.

SEA-PHAGES note fields should look like ‘tRNA-Xxx(nnn)’. :param eval_id: unique identifier for the evaluation :type eval_id: str :param success: status if the outcome is successful :type success: str :param fail: status if the outcome is unsuccessful :type fail: str :param eval_def: description of the evaluation :type eval_def: str :return:

check_orientation(fmt='fr_short', case=True, eval_id=None, success='correct', fail='error', eval_def=None)

Check that the orientation is set appropriately. :param fmt: indicates how coordinates should be formatted :type fmt: str :param case: indicates whether orientation data should be cased :type case: bool :param eval_id: unique identifier for the evaluation :type eval_id: str :param success: status if the outcome is successful :type success: str :param fail: status if the outcome is unsuccessful :type fail: str :param eval_def: description of the evaluation :type eval_def: str :return:

check_orientation_correct(fmt='fr_short', case=True, eval_id=None, success='correct', fail='error', eval_def=None)

Check that the orientation agrees with the Aragorn and/or tRNAscan-SE predicted orientation. If Aragorn/tRNAscan-SE report a forward orientation, it means they agree with the annotated orientation. If they report reverse orientation, they think the annotation is backwards. :param fmt: indicates how coordinates should be formatted :type fmt: str :param case: indicates whether orientation data should be cased :type case: bool :param eval_id: unique identifier for the evaluation :type eval_id: str :param success: status if the outcome is successful :type success: str :param fail: status if the outcome is unsuccessful :type fail: str :param eval_def: description of the evaluation :type eval_def: str :return:

check_product_structure(eval_id=None, success='correct', fail='error', eval_def=None)

Checks that the product field is formatted properly, and that the annotated amino acid is valid. Genbank enforces that all tRNA annotations that have a product field have it annotated as either ‘tRNA-Xxx’, where Xxx is one of the 20 standard amino acids, or ‘tRNA-OTHER’ for those tRNAs which decode a non-standard amino acid (e.g. SeC, Pyl, fMet). SEA-PHAGES may also append the anticodon parenthetically for a product field such as ‘tRNA-Xxx(nnn)’. :param eval_id: unique identifier for the evaluation :type eval_id: str :param success: status if the outcome is successful :type success: str :param fail: status if the outcome is unsuccessful :type fail: str :param eval_def: description of the evaluation :type eval_def: str :return:

check_sources(eval_id=None, success='correct', fail='error', eval_def=None)

Check that this tRNA’s DNA sequence can successfully turn up a tRNA when run through Aragorn and tRNAscan-SE. :param eval_id: unique identifier for the evaluation :type eval_id: str :param success: status if the outcome is successful :type success: str :param fail: status if the outcome is unsuccessful :type fail: str :param eval_def: description of the evaluation :type eval_def: str :return:

check_terminal_nucleotides(eval_id=None, success='correct', fail='warning', eval_def=None)

Checks that the tRNA ends with “CCA” or “CC” or “C”. :param eval_id: unique identifier for the evaluation :type eval_id: str :param success: status if the outcome is successful :type success: str :param fail: status if the outcome is unsuccessful :type fail: str :param eval_def: description of the evaluation :type eval_def: str :return:

create_seqfeature(type, start, stop, strand)
get_begin_end()

Accesses feature coordinates in transcription begin-end format. :return: (begin, end)

get_qualifiers(type)

Helper function that uses tRNA data to populate the qualifiers attribute of seqfeature. :return: qualifiers OrderedDict()

parse_amino_acid()

Attempts to parse the amino_acid attribute from the product and note attributes. :return:

parse_anticodon()

Attempts to parse the anticodon attribute from the note attribute. :return:

reformat_start_and_stop(fmt)

Convert existing start and stop coordinates to the indicated new format; also updates the coordinate format attribute to reflect any change. :param fmt: the new desired coordinate format :type fmt: str :return:

run_aragorn()

Uses an AragornHandler object to negotiate the flow of information between this object and Aragorn. :return:

run_trnascanse()

Uses a TRNAscanSEHandler object to negotiate the flow of information between this object and tRNAscan-SE. :return:

set_amino_acid(value)

Sets the amino_acid attribute using the indicated value. :param value: the Amino acid to be used :type value: str :raise: ValueError :return:

set_anticodon(value)

Sets the anticodon attribute using the indicated value. :param value: the anticodon to use for this tRNA :type value: str :return:

set_eval(eval_id, definition, result, status)

Constructs and adds and Evaluation object to this feature’s list of evaluations. :param eval_id: unique identifier for the evaluation :type eval_id: str :param definition: description of the evaluation :type definition: str :param result: description of the evaluation outcome :type result: str :param status: overall outcome of the evaluation :type status: str :return:

set_gene(value, delimiter=None, prefix_set=None)

Set the gene attribute. :param value: Gene data to parse. Also passed to set_num(). :type value: str :param delimiter: Passed to set_num(). :type delimiter: str :param prefix_set: Passed to set_num(). :type prefix_set: set

set_location_id()

Create identifier tuples containing feature location data. For this method we only care about gene boundaries and will ignore any multi-part elements to the gene. :return:

set_locus_tag(tag='', delimiter='_', check_value=None)

Populate the locus_tag and parse the locus_tag number. :param tag: Input locus_tag data :type tag: str :param delimiter: Value used to split locus_tag data :type delimiter: str :param check_value: Genome name or other value that will be used to parse the locus_tag to identify the feature number :type check_value: str

set_name(value=None)

Set the feature name. Ideally, the name of the CDS will be an integer. This information can be stored in multiple fields in the GenBank-formatted flat file. The name is derived from one of several qualifiers. :param value: Indicates a value that should be used to directly set the name regardless of the ‘gene’ and ‘_locus_tag_num’ attributes. :type value: str

set_nucleotide_length(use_seq=False)

Set the nucleotide length of this gene feature. :param use_seq: whether to use the Seq feature to calculate nucleotide length of this feature :type use_seq: bool :return:

set_nucleotide_sequence(value=None, parent_genome_seq=None)

Set this feature’s nucleotide sequence :param value: sequence :type value: str or Seq :param parent_genome_seq: parent genome sequence :type parent_genome_seq: Seq :raise: ValueError :return:

set_num(attr, description, delimiter=None, prefix_set=None)

Set a number attribute from a description. :param attr: Attribute to set the number. :type attr: str :param description: Description data from which to parse the number. :type description: str :param delimiter: Value used to split the description data. :type delimiter: str :param prefix_set: Valid possible delimiters in the description. :type prefix_set: set

set_orientation(value, fmt, case=False)

Set the orientation based on the indicated format. :param value: orientation value :type value: int or str :param fmt: how orientation should be formatted :type fmt: str :param case: whether to capitalize the first letter of orientation :type case: bool :return:

set_seqfeature(type=None)

Create a SeqFeature object with which to populate the seqfeature attribute. :return:

set_structure(value)

Set the secondary structure string so downstream users can easily display the predicted fold of this tRNA. :param value: the string to use as the secondary structure :type value: str :return:

pdm_utils.classes.trnascansehandler module

class pdm_utils.classes.trnascansehandler.TRNAscanSEHandler(identifier, sequence)

Bases: object

parse_trnas()

Searches out_str for matches to a regular expression for tRNAscan-SE tRNAs. :return:

read_output()

Reads the Aragorn output file and joins the lines into a single string which it populates into out_str. :return:

run_trnascanse(x=10)

Set up tRNAscan-SE command, then run it. Explanation of arguments: :param x: score cutoff for tRNAscan-SE :type x: int :return:

write_fasta()

Writes the search sequence to input file in FASTA format. :return:

Module contents