Import tickets¶
A structured, database ticketing system is used to automate several steps in the database management process, including importing new data, making updates to the database, and maintaining a record of changes. For most types of changes to the database, there needs to be a unique ‘ticket’ in a csv-formatted import table that provides instructions on how to implement the change.
Ticket structure¶
Within the import table, an individual row of data populating 11 columns constructs a unique ticket.
type: there are currently two types of tickets that the import script implements:
‘add’ if a new phage genome needs to be added to the database.
‘replace’ if a phage genome currently in the database needs to be replaced with a new phage genome.
phage_id: the name of the new phage genome that the ticket addresses. Ensure the spelling of the phage name in the ticket precisely matches the spelling of the name in the flat file.
description_field: indicates the field in the CDS feature annotations in the associated Genbank-formatted flat file that is expected to contain the gene descriptions (‘product’, ‘function’, ‘note’).
eval_mode: indicates the evaluation mode (‘draft’, ‘final’, ‘auto’, ‘misc’, ‘custom’), determining which QC checks to implement and thus how the flat file is evaluated.
host_genus: the genus of the bacterial host that the phage infects.
cluster: the Cluster designation for the phage, which should be:
the assigned Cluster, if it is in a Cluster.
‘Singleton’ if the phage is not in a Cluster.
‘UNK’ if the Cluster has not yet been determined.
subcluster: the Subcluster designation for the phage, which should be:
the assigned Subcluster, if it is in a Subcluster.
‘none’ if the phage is not in a Subcluster, or if the Cluster has not yet been determined.
accession: the accession for the GenBank record, which should be:
the assigned accession, if available.
‘none’ if there is not accession.
annotation_author: indicates whether the end-user has control of the annotations, which should be:
‘1’ if the end-user has control of the annotations.
‘0’ if the end-user does not have control of the annotations.
retrieve_record: indicates whether the genome should be automatically updated when the associated GenBank record (if available) is updated. This should be:
‘1’ if new versions of the GenBank record should be retrieved.
‘0’ if new versions of the GenBank record should NOT be retrieved.
annotation_status: the stage of gene annotations, which should be:
‘draft’ if the annotations have been automatically generated.
‘final’ if the annotations have been manually generated.
‘unknown’ if the strategy of annotation is not known.
Import tickets are automatically generated by pdm_utils get_data, but they can also be manually generated, such as the following add and replace tickets:
type |
phage_id |
description_field |
eval_mode |
host_genus |
cluster |
subcluster |
accession |
annotation_author |
retrieve_record |
annotation_status |
add |
Trixie |
product |
draft |
Mycobacterium |
A |
A2 |
none |
1 |
1 |
draft |
replace |
Finch |
function |
final |
Rhodococcus |
Singleton |
none |
MG962366 |
1 |
1 |
final |
Note
The first row in the import table SHOULD be the column headers exactly as indicated above. Each subsequent row should represent a unique import ticket.
Ticket field options¶
Some fields can be set to pre-defined keywords for automatic data acquisition:
‘retrieve’: for genomes that are also in PhagesDB, the data should be retrieved from PhagesDB.
‘retain’: for genomes that are being replaced, the data already present in the database should be retained.
‘parse’: for data that should be automatically parsed from the flat file.
Some fields can be set to the pre-defined keyword ‘none’ if they are not applicable for the ticket.
Since some field settings are commonly shared between all tickets, they can be omitted from the import table and set as a
importcommand line argument instead.
The table below indicates which of the above options can be used for each ticket field:
type |
phage_id |
description_field |
eval_mode |
host_genus |
cluster |
subcluster |
accession |
annotation_author |
retrieve_record |
annotation_status |
retrieve |
retrieve |
retrieve |
retrieve |
|||||||
retain |
retain |
retain |
retain |
retain |
retain |
|||||
parse |
parse |
|||||||||
command |
command |
|||||||||
none |
none |
Automatic ticket construction¶
A simplified ‘minimal’ ticket can be used for adding and replacing genomes in which several fields are automatically populated when import is run:
the ‘type’ and ‘phage_id’ fields need to be manually indicated in the import table.
the ‘description_field’ and ‘eval_mode’ field settings are determined from the default command line arguments (‘product’ and ‘final’, respectively).
for replace tickets, the ‘annotation_status’ is set to ‘final’ if the current genome is set to ‘draft’, otherwise it is set to ‘retain’.
The table below indicates default settings for each type of ticket:
type |
phage_id |
description_field |
eval_mode |
host_genus |
cluster |
subcluster |
accession |
annotation_author |
retrieve_record |
annotation_status |
add |
<manual> |
<command> |
<command> |
retrieve |
retrieve |
retrieve |
retrieve |
1 |
1 |
draft |
replace |
<manual> |
<command> |
<command> |
retain |
retain |
retain |
retain |
retain |
retain |
final or retain |