The bio-centric ORM

The pdm_utils bio-centric ORM is designed to process data that has been stored/retrieved in a MySQL database, GenBank-formatted flat files, PhagesDB, etc. Data is parsed from GenBank-formatted flat files using BioPython. Data is parsed from PhagesDB using the PhagesDB API. As a result, some pdm_utils objects contain attributes that directly map to each of these different data structures.

Refer to the brief introductory library tutorial to coding with the pdm_utils library.

Below is a map of how genome-level data is stored within these data structures:

pdm_utils object.attribute

MySQL table.column

GenBank flat file BioPython object.attribute

PhagesDB API dictionary

Genome.id

phage.PhageID

SeqRecord.id OR user-selected

dictionary[“phage_name”]

Genome.name

phage.Name

SeqRecord.name OR user-selected

dictionary[“phage_name”]

Genome.seq

phage.Sequence

SeqRecord.seq

dictionary[“fasta_file”]

Genome.length

phage.Length

SeqRecord.seq

dictionary[“fasta_file”]

Genome.gc

phage.GC

SeqRecord.seq

dictionary[“fasta_file”]

Genome.cluster

phage.Cluster

N/A

dictionary[“pcluster”][“cluster”]

Genome.subcluster

phage.Subcluster

N/A

dictionary[“psubcluster”][“subcluster”]

Genome.accession

phage.Accession

SeqRecord.annotations[“accessions”][0]

dictionary[“genbank_accession”]

Genome.host_genus

phage.HostGenus

SeqRecord.annotations[“source”] OR [“organism”] OR [“description”] OR user-selected

dictionary[“isolation_host”][“genus”]

Genome.date

phage.DateLastModified

SeqRecord.annotations[“date”]

N/A

Genome.annotation_status

phage.Status

N/A

N/A

Genome.retrieve_record

phage.RetrieveRecord

N/A

N/A

Genome.annotation_author

phage.AnnotationAuthor

N/A

N/A

Genome.description

N/A

SeqRecord.description

dictionary[“fasta_file”]

Genome.source

N/A

SeqRecord.annotations[“source”]

N/A

Genome.organism

N/A

SeqRecord.annotations[“organism”]

N/A

Genome.authors

N/A

Reference.authors (from SeqRecord.annotations[“references”])

N/A

Genome.filename

N/A

N/A

dictionary[“fasta_file”]

Below is a map of how CDS-level data is stored within these data structures:

pdm_utils object.attribute

MySQL table.column

GenBank flat file BioPython object.attribute

PhagesDB API dictionary

Cds.id

gene.GeneID

N/A

N/A

Cds.name

gene.Name

SeqFeature qualifiers

N/A

Cds.genome_id

gene.PhageID

N/A

N/A

Cds.start

gene.Start

SeqFeature.location.start OR end

N/A

Cds.stop

gene.Stop

SeqFeature.location.start OR end

N/A

Cds.coordinate_format

N/A

N/A

N/A

Cds.orientation

gene.Orientation

SeqFeature.strand

N/A

Cds.parts

N/A

SeqFeature.location.parts

N/A

Cds.seq

N/A

N/A

N/A

Cds.length

gene.Length

SeqFeature.location.start AND end

N/A

Cds.translation

gene.Translation

SeqFeature.qualifiers[“translation”][0]

N/A

Cds.translation_length

N/A

SeqFeature.qualifiers[“translation”][0]

N/A

Cds.translation_table

N/A

SeqFeature.qualifiers[“transl_table”][0]

N/A

Cds.locus_tag

gene.LocusTag

SeqFeature.qualifiers[“locus_tag”][0]

N/A

Cds.description

gene.Notes

SeqFeature.qualifiers[“product”] OR [“function”] OR [“note”]

N/A

Cds.gene

N/A

SeqFeature.qualifiers[“gene”][0]

N/A

Cds.product

N/A

SeqFeature.qualifiers[“product”][0]

N/A

Cds.function

N/A

SeqFeature.qualifiers[“function”][0]

N/A

Cds.note

N/A

SeqFeature.qualifiers[“note”][0]

N/A

Cds.seqfeature

N/A

SeqFeature

N/A

N/A

gene.DomainStatus

N/A

N/A

Below is a map of how tRNA-level data is stored within these data structures:

pdm_utils object.attribute

MySQL table.column

GenBank flat file BioPython object.attribute

PhagesDB API dictionary

Trna.id

trna.GeneID

N/A

N/A

Trna.name

trna.Name

SeqFeature qualifiers

N/A

Trna.genome_id

trna.PhageID

N/A

N/A

Trna.start

trna.Start

SeqFeature.location.start OR end

N/A

Trna.stop

trna.Stop

SeqFeature.location.start OR end

N/A

Trna.coordinate_format

N/A

N/A

N/A

Trna.orientation

trna.Orientation

SeqFeature.strand

N/A

Trna.parts

N/A

SeqFeature.location.parts

N/A

Trna.length

trna.Length

SeqFeature.location.start AND end

N/A

Trna.locus_tag

trna.LocusTag

SeqFeature.qualifiers[“locus_tag”][0]

N/A

Trna.note

trna.Note

SeqFeature.qualifiers[“note”][0]

N/A

Trna.seqfeature

N/A

SeqFeature

N/A

Trna.amino_acid

trna.AminoAcid

SeqFeature.qualifiers[“product”][0]

N/A

Trna.anticodon

trna.Anticodon

SeqFeature.qualifiers[“note”][0]

N/A

Trna.structure

trna.Structure

N/A

N/A

Trna.use

trna.Source

N/A

N/A

Trna.product

N/A

SeqFeature.qualifiers[“product”][0]

N/A

Trna.gene

N/A

SeqFeature.qualifiers[“gene”][0]

N/A

Below is a map of how tmRNA-level data is stored within these data structures:

pdm_utils object.attribute

MySQL table.column

GenBank flat file BioPython object.attribute

PhagesDB API dictionary

Tmrna.id

tmrna.GeneID

N/A

N/A

Tmrna.name

tmrna.Name

SeqFeature qualifiers

N/A

Tmrna.genome_id

tmrna.PhageID

N/A

N/A

Tmrna.start

tmrna.Start

SeqFeature.location.start OR end

N/A

Tmrna.stop

tmrna.Stop

SeqFeature.location.start OR end

N/A

Tmrna.coordinate_format

N/A

N/A

N/A

Tmrna.orientation

tmrna.Orientation

SeqFeature.strand

N/A

Tmrna.parts

N/A

SeqFeature.location.parts

N/A

Tmrna.length

tmrna.Length

SeqFeature.location.start AND end

N/A

Tmrna.locus_tag

tmrna.LocusTag

SeqFeature.qualifiers[“locus_tag”][0]

N/A

Tmrna.note

tmrna.Note

SeqFeature.qualifiers[“note”][0]

N/A

Tmrna.seqfeature

N/A

SeqFeature

N/A

Tmrna.gene

N/A

SeqFeature.qualifiers[“gene”][0]

N/A

Tmrna.peptide_tag

tmrna.PeptideTag

SeqFeature.qualifiers[“note”][0]

N/A

GenBank-formatted flat files contain a Source feature. Although this data is not stored within the MySQL database, it is parsed and evaluated for quality when the genome is imported into the database. Below is a map of how Source-level data is stored within these data structures:

pdm_utils object.attribute

MySQL table.column

GenBank flat file BioPython object.attribute

PhagesDB API dictionary

Source.id

N/A

N/A

N/A

Source.name

N/A

N/A

N/A

Source.seqfeature

N/A

SeqFeature

N/A

Source.start

N/A

SeqFeature.location.start OR end

N/A

Source.stop

N/A

SeqFeature.location.start OR end

N/A

Source.organism

N/A

SeqFeature.qualifiers[“organism”][0]

N/A

Source.host

N/A

SeqFeature.qualifiers[“host”][0]

N/A

Source.lab_host

N/A

SeqFeature.qualifiers[“lab_host”][0]

N/A