Database management pipelines¶
Below is a description of command line tools that the pdm_utils
package
contains to analyze and manipulate data within a MySQL database:
Some tools are bound to the most current schema version (“yes”), as they need to know where to find types of data in the database. Other tools (“no”), or sub-tools (“yes”/”no”), are schema-agnostic, and can be used for any type of MySQL database.
Most tools provide functionality without respect to SEA-PHAGES-specific assumptions or goals. Some tools are more oriented to the SEA-PHAGES program by providing (optional) interactions with PhagesDB or by encoding SEA-PHAGES-specific assumptions about the data. For instance, a series of evaluations are implemented in the import and compare pipelines to ensure data quality, and while some of these evaluations are SEA-PHAGES-specific, others are not.
Tool |
Description |
Schema bound |
SEA-PHAGES oriented |
compare |
Directly compare phage data between a database instance, PhagesDB, and GenBank |
Yes |
Yes |
convert |
Upgrade or downgrade a database instance to another schema version |
No |
No |
export |
Export data from a database |
Yes/No |
No |
find_domains |
Identify NCBI conserved domains in genes |
Yes |
No |
freeze |
Create a derivative database instance that is no longer routinely updated |
Yes |
No |
get_db |
Retrieve the most up-to-date version of the database |
No |
No |
get_data |
Retrieve new data that needs to be imported or updated from PhagesDB and GenBank |
Yes |
Yes |
get_gb_records |
Retrieve GenBank records associated with genomes in the database |
Yes |
No |
import |
Import new or replacement genome annotations |
Yes |
Yes |
phamerate |
Group phage genes into phamilies based on amino acid sequence similarity |
Yes |
No |
push |
Push an updated database to a public server |
No |
No |
review |
Review gene description data consistency of a database |
Yes |
No |
revise |
Revise inconsistent data and prepare for submission to GenBank for updating records |
Yes |
No |
update |
Update specific fields |
No |
No |
The pdm_utils
toolkit can be used to manage different database instances. However, some tools may only be relevant specifically to the primary instance, Actino_Draft.