get_data¶
Pipeline to gather new data to be imported into a MySQL database.
- pdm_utils.pipelines.get_data.check_record_date(record_list, accession_dict)¶
Check whether the GenBank record is new.
- pdm_utils.pipelines.get_data.compare_data(gnm_pair)¶
Compare data and create update tickets.
- pdm_utils.pipelines.get_data.compute_genbank_tallies(results)¶
Tally results from GenBank retrieval.
- pdm_utils.pipelines.get_data.convert_tickets_to_dict(list_of_tickets)¶
Convert list of tickets to list of dictionaries.
- pdm_utils.pipelines.get_data.create_accession_sets(genome_dict)¶
Generate set of unique and non-unique accessions.
Input is a dictionary of pdm_utils genome objects.
- pdm_utils.pipelines.get_data.create_draft_ticket(name)¶
Create ImportTicket for draft genome.
- pdm_utils.pipelines.get_data.create_genbank_ticket(gnm)¶
Create ImportTicket for GenBank record.
- pdm_utils.pipelines.get_data.create_phagesdb_ticket(phage_id)¶
Create ImportTicket for PhagesDB genome.
- pdm_utils.pipelines.get_data.create_results_dict(gnm, genbank_date, result)¶
Create a dictionary of data summarizing NCBI retrieval status.
- pdm_utils.pipelines.get_data.create_ticket_table(tickets, output_folder)¶
Save tickets associated with retrieved from GenBank files.
- pdm_utils.pipelines.get_data.create_update_ticket(field, value, key_value)¶
Create update ticket.
- pdm_utils.pipelines.get_data.get_accessions_to_retrieve(summary_records, accession_dict)¶
Review GenBank summary to determine which records are new.
- pdm_utils.pipelines.get_data.get_draft_data(output_path, phage_id_set)¶
Run sub-pipeline to retrieve auto-annotated ‘draft’ genomes.
- pdm_utils.pipelines.get_data.get_final_data(output_folder, matched_genomes)¶
Run sub-pipeline to retrieve ‘final’ genomes from PhagesDB.
- pdm_utils.pipelines.get_data.get_genbank_data(output_folder, genome_dict, ncbi_cred_dict={}, genbank_results=False, force=False)¶
Run sub-pipeline to retrieve genomes from GenBank.
- pdm_utils.pipelines.get_data.get_matched_drafts(matched_genomes)¶
Generate a list of matched ‘draft’ genomes.
- pdm_utils.pipelines.get_data.get_update_data(output_folder, matched_genomes)¶
Run sub-pipeline to retrieve field updates from PhagesDB.
- pdm_utils.pipelines.get_data.main(unparsed_args_list)¶
Run main retrieve_updates pipeline.
- pdm_utils.pipelines.get_data.match_genomes(dict1, dict2)¶
Match MySQL database genome data to PhagesDB genome data.
Both dictionaries: Key = PhageID Value = pdm_utils genome object
- pdm_utils.pipelines.get_data.output_genbank_summary(output_folder, results)¶
Save summary of GenBank retrieval results to file.
- pdm_utils.pipelines.get_data.parse_args(unparsed_args_list)¶
Verify the correct arguments are selected for getting updates.
- pdm_utils.pipelines.get_data.print_genbank_tallies(tallies)¶
Print results of GenBank retrieval.
- pdm_utils.pipelines.get_data.print_match_results(dict)¶
Print results of genome matching.
- pdm_utils.pipelines.get_data.process_failed_retrieval(accession_list, accession_dict)¶
Create list of dictionaries for records that could not be retrieved.
- pdm_utils.pipelines.get_data.retrieve_drafts(output_folder, phage_list)¶
Retrieve auto-annotated ‘draft’ genomes from PECAAN.
- pdm_utils.pipelines.get_data.retrieve_records(accession_dict, ncbi_folder, batch_size=200)¶
Retrieve GenBank records.
- pdm_utils.pipelines.get_data.save_and_tickets(record_list, accession_dict, output_folder)¶
Save flat files retrieved from GenBank and create import tickets.
- pdm_utils.pipelines.get_data.save_genbank_file(seqrecord, accession, name, output_folder)¶
Save retrieved record to file.
- pdm_utils.pipelines.get_data.save_pecaan_file(response, name, output_folder)¶
Save data retrieved from PECAAN.
- pdm_utils.pipelines.get_data.save_phagesdb_file(data, gnm, output_folder)¶
Save file retrieved from PhagesDB.
- pdm_utils.pipelines.get_data.set_phagesdb_gnm_date(gnm)¶
Set the date of a PhagesDB genome object.
- pdm_utils.pipelines.get_data.set_phagesdb_gnm_file(gnm)¶
Set the filename of a PhagesDB genome object.
- pdm_utils.pipelines.get_data.sort_by_accession(genome_dict, force=False)¶
Sort genome objects based on their accession status.
Only retain data if genome is set to be automatically updated, there is a valid accession, and the accession is unique.