Front-end ORM

In contrast to the customized ‘back-end’ biology-centric ORM, an orthologous and distinct ‘front-end’ SQLAlchemy ORM is available through pdm_utils. This ORM is automatically generated from SQLAlchemy and consists of a collection of classes that have a direct relationship to the structure of the database, including its tables, columns, datatypes, and table relationships.

Classes in this ORM do not contain any special biology-related methods. However, in contrast to the bio-centric ORM, the ‘front-end’ ORM is completely agnostic to the schema version. This means that pdm_utils can be used to retrieve a database, convert the schema to any version in the schema history, and a perfectly consistent ORM is immediately available.

This ORM is automatically generated by SQLAlchemy and pdm_utils merely provides the lightweight, requisite, boilerplate code to connect this software to a specific phage database instance. As a result, pdm_utils serves as a thin wrapper for SQLAlchemy, and can get out of the way so that the end-user can solely rely on and leverage the powerful, highly-refined and well-supported SQLAlchemy toolkit to build novel pipelines.

Introduction to the SQLAlchemy ORM

The SQLAlchemy ORM is a powerful tool for modifying and exploring a database. Access to the ORM is available through the ‘mapper’ attribute of the pdm_utils AlchemyHandler object:

>>> from pdm_utils.classes import alchemyhandler
>>> alchemist = alchemyhandler.AlchemyHandler()
>>> alchemist.connect(ask_database=True)
>>> mapper = alchemist.mapper
>>> type(mapper)
<class 'sqlalchemy.ext.declarative.api.DeclarativeMeta'>

Get a list of classes representing all available tables in the database:

>>>  mapper.classes.keys()
['domain', 'gene', 'phage', 'pham', 'gene_domain', 'tmrna', 'trna', 'version']

Select the phage table class:

>>> Phage = mapper.classes["phage"]
>>> type(Phage)
<class 'sqlalchemy.ext.declarative.api.DeclarativeMeta'>

Create a new entry (e.g. “new_phage”) for the phage table, assigning values to each valid column:

>>> new_phage = Phage(PhageID="NewPhage", Name="NewPhage_Draft", HostGenus="Mycobacterium", Cluster="A", Subcluster="A2", Sequence="AAAA")
>>> type(new_phage)
<class 'sqlalchemy.ext.automap.phage'>
>>> new_phage.Cluster
'A'

The SQLAlchemy ‘session’ object uses these ORM classes to query for data, add new entries, update entries, and remove entries. A session object can be used to retrieve instances that are tracked by the session. A session can be created using the pdm_utils AlchemyHandler object:

>>> session = alchemist.session
>>> type(session)
<class 'sqlalchemy.orm.session.Session'>

Note

The session is a powerful object that manages access to the database. Please refer to SQLAlchemy documentation to learn more about what it does and how to properly implement it.

Query the database for phages in Cluster A:

>>> query = session.query(Phage).filter(Phage.Cluster=="A")
>>> phages = query.all()
>>> type(phages)
<class 'list'>
>>> len(phages)
643
>>> phage1 = phages[0]
>>> phage1.PhageID
'20ES'
>>> phage1.Subcluster
'A2'

SQLAlchemy automatically retrieves data from connected tables such as gene and trna:

>>> len(phage1.gene_collection)
96
>>> cds1 = phage1.gene_collection[0]
>>> cds1.GeneID
'20ES_CDS_1'
>>> cds1.Start
568

Now query the database for phages that are in Cluster A and that have CDS features annotated as “repressor” in the gene table:

>>> Gene = mapper.classes["gene"]
>>> query = session.query(Phage).join(Gene).filter(Phage.Cluster=="A").filter(Gene.Notes=="repressor".encode("utf-8"))
>>> phages = query.all()
>>> len(phages)
38

When done with the session, commit any changes:

>>> session.commit()

pdm_utils tools based on the SQLAlchemy ORM

Several tools in pdm_utils can leverage the SQLAlchemy ORM to provide additional functionalities.

For instance, SQLAlchemy ORM mapped objects can be retrieved from case-insensitive MySQL formatted inputs using the pdm_utils ‘cartography’ module:

>>> from pdm_utils.functions import cartography
>>> phage = cartography.get_map(alchemist.mapper, "PHAGE")

Python objects, ORM map instances, that reflect entries in the database can be retrieved using the session object, and conditionals to filter the results of the query can be formed using attributes of the retrieved ORM map.:

>>> phage = cartography.get_map(alchemist.mapper, "phage")
>>> trixie = alchemist.session.query(phage).filter(phage.PhageID == 'Trixie').scalar()

The retrieved instance is a reflection of a single entry tied to the primary_key from the mapped table specified, and has attributes that reflect the related columns of that table:

>>> trixie.PhageID
'Trixie'
>>> trixie.Cluster
'A'
>>> trixie.Length
53526

The dynamic querying from pdm_utils ‘querying’ module can be applied to SQLAlchemy ORM queries using the query() function, and SQLAlchemy base objects and conditionals can be incorporated from the querying module into ORM queries to generate ORM objects:

>>> phage = cartography.get_map(alchemist.mapper, "phage")
>>> subcluster_conditional = phage.Subcluster == 'A2'
>>> notes_conditional = querying.build_where_clause(alchemist.graph, "phage.Notes = 'antirepressor'")
>>> conditionals = [subcluster_conditional, notes_conditional]
>>> mapped_obj_instances = querying.query(alchemist.session, alchemist.graph, phage, where=conditionals)
>>> phage_instance = mapped_obj_instances[0]
>>> phage_instance.PhageID
'IronMan'

Additionally, the pdm_utils Filter object can be used to retrieve these mapped instances. The filter object can apply filters and retrieve a list of values that can be used to retrieve a similar set of mapped obj instances:

>>> phage = cartography.get_map(alchemist.mapper, "phage")
>>> db_filter.add("phage.Subcluster = 'A2' AND gene.Notes = 'antirepressor'")
>>> db_filter.update()
>>> mapped_obj_instances = db_filter.query(phage)
>>> phage_instance = mapped_obj_instances[0]
>>> phage_instance.PhageID
'IronMan'

The SQLAlchemy session tracks instances generated through these queries, and can be used to manually manage entries in the database:

>>> phage = cartography.get_map(alchemist.mapper, "phage")
>>> IronMan = alchemist.session.query(phage).filter(phage.PhageID == 'IronMan').scalar()
>>> IronMan.DateLastModified
datetime.datetime(2020, 3, 13, 0, 0)
>>> from datetime import datetime
>>> today = datetime(2020, 5, 22, 0, 0)
>>> IronMan.DateLastModified = today
>>> alchemist.session.commit()
>>> IronMan = alchemist.session.query(phage).filter(phage.PhageID == 'IronMan').scalar()
>>> IronMan.DateLastModified
datetime.datetime(2020, 5, 22, 0, 0)

Once references to instances have been acquired using the session, entries in the database can also be deleted:

>>> IronMan = alchemist.session.query(phage).filter(phage.PhageID == 'IronMan').scalar()
>>> alchemist.session.delete(IronMan)
>>> alchemist.session.query(phage).filter_by(PhageID='IronMan').count()
0

The SQLAlchemy map can also be used to instantiate new objects that are then added as entries to the database:

>>> phage = cartography.get_map(alchemist.mapper, "phage")
>>> Phabulous = phage(PhageID='Phabulous', Cluster='A', Subcluster='A2', Length=52342)
>>> alchemist.session.commit()
>>> alchemist.session.query(phage).filter_by(PhageID='Phabulous').count()
1

SQLAlchemy mapped instances generated from a session also have access to the data that the relevant entry has a relationship with:

>>> phage = cartography.get_map(alchemist.mapper, "phage")
>>> IronMan = alchemist.session.query(phage).filter_by(PhageID='IronMan').scalar()
>>> IronMan_genes = IronMan.gene_collection
>>> IronMan_gene1 = IronMan_genes[0]
>>> IronMan_gene1.PhageID
'IronMan'
>>> IronMan_gene1.Name
1
>>> IronMan_gene1.GeneID
IronMan_CDS_1

These instances retrieved from the relationship attributes of another mapped instance can likewise be updated or deleted with use of the SQLAlchemy session:

>>> IronMan = alchemist.session.query(phage).filter_by(PhageID='IronMan').scalar()
>>> IronMan_gene1 = IronMan.gene_collection[0]
>>> IronMan_gene1.PhamID
42415
>>> IronMan_gene1.PhamID = 54326
>>> alchemist.session.commit()

When all interaction with MySQL is complete, the DBAPI connections can be closed:

>>> engine.dispose()