The bio-centric ORM¶
The pdm_utils
bio-centric ORM is designed to process data that has been stored/retrieved in a MySQL database, GenBank-formatted flat files, PhagesDB, etc. Data is parsed from GenBank-formatted flat files using BioPython. Data is parsed from PhagesDB using the PhagesDB API. As a result, some pdm_utils
objects contain attributes that directly map to each of these different data structures.
Refer to the brief introductory library tutorial to coding with the pdm_utils
library.
Below is a map of how genome-level data is stored within these data structures:
pdm_utils object.attribute |
MySQL table.column |
GenBank flat file BioPython object.attribute |
PhagesDB API dictionary |
Genome.id |
phage.PhageID |
SeqRecord.id OR user-selected |
dictionary[“phage_name”] |
Genome.name |
phage.Name |
SeqRecord.name OR user-selected |
dictionary[“phage_name”] |
Genome.seq |
phage.Sequence |
SeqRecord.seq |
dictionary[“fasta_file”] |
Genome.length |
phage.Length |
SeqRecord.seq |
dictionary[“fasta_file”] |
Genome.gc |
phage.GC |
SeqRecord.seq |
dictionary[“fasta_file”] |
Genome.cluster |
phage.Cluster |
N/A |
dictionary[“pcluster”][“cluster”] |
Genome.subcluster |
phage.Subcluster |
N/A |
dictionary[“psubcluster”][“subcluster”] |
Genome.accession |
phage.Accession |
SeqRecord.annotations[“accessions”][0] |
dictionary[“genbank_accession”] |
Genome.host_genus |
phage.HostGenus |
SeqRecord.annotations[“source”] OR [“organism”] OR [“description”] OR user-selected |
dictionary[“isolation_host”][“genus”] |
Genome.date |
phage.DateLastModified |
SeqRecord.annotations[“date”] |
N/A |
Genome.annotation_status |
phage.Status |
N/A |
N/A |
Genome.retrieve_record |
phage.RetrieveRecord |
N/A |
N/A |
Genome.annotation_author |
phage.AnnotationAuthor |
N/A |
N/A |
Genome.description |
N/A |
SeqRecord.description |
dictionary[“fasta_file”] |
Genome.source |
N/A |
SeqRecord.annotations[“source”] |
N/A |
Genome.organism |
N/A |
SeqRecord.annotations[“organism”] |
N/A |
Genome.authors |
N/A |
Reference.authors (from SeqRecord.annotations[“references”]) |
N/A |
Genome.filename |
N/A |
N/A |
dictionary[“fasta_file”] |
Below is a map of how CDS-level data is stored within these data structures:
pdm_utils object.attribute |
MySQL table.column |
GenBank flat file BioPython object.attribute |
PhagesDB API dictionary |
Cds.id |
gene.GeneID |
N/A |
N/A |
Cds.name |
gene.Name |
SeqFeature qualifiers |
N/A |
Cds.genome_id |
gene.PhageID |
N/A |
N/A |
Cds.start |
gene.Start |
SeqFeature.location.start OR end |
N/A |
Cds.stop |
gene.Stop |
SeqFeature.location.start OR end |
N/A |
Cds.coordinate_format |
N/A |
N/A |
N/A |
Cds.orientation |
gene.Orientation |
SeqFeature.strand |
N/A |
Cds.parts |
N/A |
SeqFeature.location.parts |
N/A |
Cds.seq |
N/A |
N/A |
N/A |
Cds.length |
gene.Length |
SeqFeature.location.start AND end |
N/A |
Cds.translation |
gene.Translation |
SeqFeature.qualifiers[“translation”][0] |
N/A |
Cds.translation_length |
N/A |
SeqFeature.qualifiers[“translation”][0] |
N/A |
Cds.translation_table |
N/A |
SeqFeature.qualifiers[“transl_table”][0] |
N/A |
Cds.locus_tag |
gene.LocusTag |
SeqFeature.qualifiers[“locus_tag”][0] |
N/A |
Cds.description |
gene.Notes |
SeqFeature.qualifiers[“product”] OR [“function”] OR [“note”] |
N/A |
Cds.gene |
N/A |
SeqFeature.qualifiers[“gene”][0] |
N/A |
Cds.product |
N/A |
SeqFeature.qualifiers[“product”][0] |
N/A |
Cds.function |
N/A |
SeqFeature.qualifiers[“function”][0] |
N/A |
Cds.note |
N/A |
SeqFeature.qualifiers[“note”][0] |
N/A |
Cds.seqfeature |
N/A |
SeqFeature |
N/A |
N/A |
gene.DomainStatus |
N/A |
N/A |
Below is a map of how tRNA-level data is stored within these data structures:
pdm_utils object.attribute |
MySQL table.column |
GenBank flat file BioPython object.attribute |
PhagesDB API dictionary |
Trna.id |
trna.GeneID |
N/A |
N/A |
Trna.name |
trna.Name |
SeqFeature qualifiers |
N/A |
Trna.genome_id |
trna.PhageID |
N/A |
N/A |
Trna.start |
trna.Start |
SeqFeature.location.start OR end |
N/A |
Trna.stop |
trna.Stop |
SeqFeature.location.start OR end |
N/A |
Trna.coordinate_format |
N/A |
N/A |
N/A |
Trna.orientation |
trna.Orientation |
SeqFeature.strand |
N/A |
Trna.parts |
N/A |
SeqFeature.location.parts |
N/A |
Trna.length |
trna.Length |
SeqFeature.location.start AND end |
N/A |
Trna.locus_tag |
trna.LocusTag |
SeqFeature.qualifiers[“locus_tag”][0] |
N/A |
Trna.note |
trna.Note |
SeqFeature.qualifiers[“note”][0] |
N/A |
Trna.seqfeature |
N/A |
SeqFeature |
N/A |
Trna.amino_acid |
trna.AminoAcid |
SeqFeature.qualifiers[“product”][0] |
N/A |
Trna.anticodon |
trna.Anticodon |
SeqFeature.qualifiers[“note”][0] |
N/A |
Trna.structure |
trna.Structure |
N/A |
N/A |
Trna.use |
trna.Source |
N/A |
N/A |
Trna.product |
N/A |
SeqFeature.qualifiers[“product”][0] |
N/A |
Trna.gene |
N/A |
SeqFeature.qualifiers[“gene”][0] |
N/A |
Below is a map of how tmRNA-level data is stored within these data structures:
pdm_utils object.attribute |
MySQL table.column |
GenBank flat file BioPython object.attribute |
PhagesDB API dictionary |
Tmrna.id |
tmrna.GeneID |
N/A |
N/A |
Tmrna.name |
tmrna.Name |
SeqFeature qualifiers |
N/A |
Tmrna.genome_id |
tmrna.PhageID |
N/A |
N/A |
Tmrna.start |
tmrna.Start |
SeqFeature.location.start OR end |
N/A |
Tmrna.stop |
tmrna.Stop |
SeqFeature.location.start OR end |
N/A |
Tmrna.coordinate_format |
N/A |
N/A |
N/A |
Tmrna.orientation |
tmrna.Orientation |
SeqFeature.strand |
N/A |
Tmrna.parts |
N/A |
SeqFeature.location.parts |
N/A |
Tmrna.length |
tmrna.Length |
SeqFeature.location.start AND end |
N/A |
Tmrna.locus_tag |
tmrna.LocusTag |
SeqFeature.qualifiers[“locus_tag”][0] |
N/A |
Tmrna.note |
tmrna.Note |
SeqFeature.qualifiers[“note”][0] |
N/A |
Tmrna.seqfeature |
N/A |
SeqFeature |
N/A |
Tmrna.gene |
N/A |
SeqFeature.qualifiers[“gene”][0] |
N/A |
Tmrna.peptide_tag |
tmrna.PeptideTag |
SeqFeature.qualifiers[“note”][0] |
N/A |
GenBank-formatted flat files contain a Source feature. Although this data is not stored within the MySQL database, it is parsed and evaluated for quality when the genome is imported into the database. Below is a map of how Source-level data is stored within these data structures:
pdm_utils object.attribute |
MySQL table.column |
GenBank flat file BioPython object.attribute |
PhagesDB API dictionary |
Source.id |
N/A |
N/A |
N/A |
Source.name |
N/A |
N/A |
N/A |
Source.seqfeature |
N/A |
SeqFeature |
N/A |
Source.start |
N/A |
SeqFeature.location.start OR end |
N/A |
Source.stop |
N/A |
SeqFeature.location.start OR end |
N/A |
Source.organism |
N/A |
SeqFeature.qualifiers[“organism”][0] |
N/A |
Source.host |
N/A |
SeqFeature.qualifiers[“host”][0] |
N/A |
Source.lab_host |
N/A |
SeqFeature.qualifiers[“lab_host”][0] |
N/A |