get_data¶

Pipeline to gather new data to be imported into a MySQL database.

pdm_utils.pipelines.get_data.check_record_date(record_list, accession_dict)¶: Check whether the GenBank record is new.

pdm_utils.pipelines.get_data.compare_data(gnm_pair)¶: Compare data and create update tickets.

pdm_utils.pipelines.get_data.compute_genbank_tallies(results)¶: Tally results from GenBank retrieval.

pdm_utils.pipelines.get_data.convert_tickets_to_dict(list_of_tickets)¶: Convert list of tickets to list of dictionaries.

pdm_utils.pipelines.get_data.create_accession_sets(genome_dict)¶

Generate set of unique and non-unique accessions.

Input is a dictionary of pdm_utils genome objects.

pdm_utils.pipelines.get_data.create_draft_ticket(name)¶: Create ImportTicket for draft genome.

pdm_utils.pipelines.get_data.create_genbank_ticket(gnm)¶: Create ImportTicket for GenBank record.

pdm_utils.pipelines.get_data.create_phagesdb_ticket(phage_id)¶: Create ImportTicket for PhagesDB genome.

pdm_utils.pipelines.get_data.create_results_dict(gnm, genbank_date, result)¶: Create a dictionary of data summarizing NCBI retrieval status.

pdm_utils.pipelines.get_data.create_ticket_table(tickets, output_folder)¶: Save tickets associated with retrieved from GenBank files.

pdm_utils.pipelines.get_data.create_update_ticket(field, value, key_value)¶: Create update ticket.

pdm_utils.pipelines.get_data.get_accessions_to_retrieve(summary_records, accession_dict)¶: Review GenBank summary to determine which records are new.

pdm_utils.pipelines.get_data.get_draft_data(output_path, phage_id_set)¶: Run sub-pipeline to retrieve auto-annotated ‘draft’ genomes.

pdm_utils.pipelines.get_data.get_final_data(output_folder, matched_genomes)¶: Run sub-pipeline to retrieve ‘final’ genomes from PhagesDB.

pdm_utils.pipelines.get_data.get_genbank_data(output_folder, genome_dict, ncbi_cred_dict={}, genbank_results=False, force=False)¶: Run sub-pipeline to retrieve genomes from GenBank.

pdm_utils.pipelines.get_data.get_matched_drafts(matched_genomes)¶: Generate a list of matched ‘draft’ genomes.

pdm_utils.pipelines.get_data.get_update_data(output_folder, matched_genomes)¶: Run sub-pipeline to retrieve field updates from PhagesDB.

pdm_utils.pipelines.get_data.main(unparsed_args_list)¶: Run main retrieve_updates pipeline.

pdm_utils.pipelines.get_data.match_genomes(dict1, dict2)¶

Match MySQL database genome data to PhagesDB genome data.

Both dictionaries: Key = PhageID Value = pdm_utils genome object

pdm_utils.pipelines.get_data.output_genbank_summary(output_folder, results)¶: Save summary of GenBank retrieval results to file.

pdm_utils.pipelines.get_data.parse_args(unparsed_args_list)¶: Verify the correct arguments are selected for getting updates.

pdm_utils.pipelines.get_data.print_genbank_tallies(tallies)¶: Print results of GenBank retrieval.

pdm_utils.pipelines.get_data.print_match_results(dict)¶: Print results of genome matching.

pdm_utils.pipelines.get_data.process_failed_retrieval(accession_list, accession_dict)¶: Create list of dictionaries for records that could not be retrieved.

pdm_utils.pipelines.get_data.retrieve_drafts(output_folder, phage_list)¶: Retrieve auto-annotated ‘draft’ genomes from PECAAN.

pdm_utils.pipelines.get_data.retrieve_records(accession_dict, ncbi_folder, batch_size=200)¶: Retrieve GenBank records.

pdm_utils.pipelines.get_data.save_and_tickets(record_list, accession_dict, output_folder)¶: Save flat files retrieved from GenBank and create import tickets.

pdm_utils.pipelines.get_data.save_genbank_file(seqrecord, accession, name, output_folder)¶: Save retrieved record to file.

pdm_utils.pipelines.get_data.save_pecaan_file(response, name, output_folder)¶: Save data retrieved from PECAAN.

pdm_utils.pipelines.get_data.save_phagesdb_file(data, gnm, output_folder)¶: Save file retrieved from PhagesDB.

pdm_utils.pipelines.get_data.set_phagesdb_gnm_date(gnm)¶: Set the date of a PhagesDB genome object.

pdm_utils.pipelines.get_data.set_phagesdb_gnm_file(gnm)¶: Set the filename of a PhagesDB genome object.

pdm_utils.pipelines.get_data.sort_by_accession(genome_dict, force=False)¶

Sort genome objects based on their accession status.

Only retain data if genome is set to be automatically updated, there is a valid accession, and the accession is unique.

Read the Docs v: latest

Versions: latest; stable

Downloads: pdf; html; epub

On Read the Docs: Project Home; Builds

Free document hosting provided by Read the Docs.