Database management pipelines

Below is a description of command line tools that the pdm_utils package contains to analyze and manipulate data within a MySQL database:

Some tools are bound to the most current schema version (“yes”), as they need to know where to find types of data in the database. Other tools (“no”), or sub-tools (“yes”/”no”), are schema-agnostic, and can be used for any type of MySQL database.

Most tools provide functionality without respect to SEA-PHAGES-specific assumptions or goals. Some tools are more oriented to the SEA-PHAGES program by providing (optional) interactions with PhagesDB or by encoding SEA-PHAGES-specific assumptions about the data. For instance, a series of evaluations are implemented in the import and compare pipelines to ensure data quality, and while some of these evaluations are SEA-PHAGES-specific, others are not.

pdm_utils tools

Tool

Description

Schema bound

SEA-PHAGES oriented

compare

Directly compare phage data between a database instance, PhagesDB, and GenBank

Yes

Yes

convert

Upgrade or downgrade a database instance to another schema version

No

No

export

Export data from a database

Yes/No

No

find_domains

Identify NCBI conserved domains in genes

Yes

No

freeze

Create a derivative database instance that is no longer routinely updated

Yes

No

get_db

Retrieve the most up-to-date version of the database

No

No

get_data

Retrieve new data that needs to be imported or updated from PhagesDB and GenBank

Yes

Yes

get_gb_records

Retrieve GenBank records associated with genomes in the database

Yes

No

import

Import new or replacement genome annotations

Yes

Yes

phamerate

Group phage genes into phamilies based on amino acid sequence similarity

Yes

No

push

Push an updated database to a public server

No

No

review

Review gene description data consistency of a database

Yes

No

revise

Revise inconsistent data and prepare for submission to GenBank for updating records

Yes

No

update

Update specific fields

No

No

The pdm_utils toolkit can be used to manage different database instances. However, some tools may only be relevant specifically to the primary instance, Actino_Draft.