ncbi

Misc. functions to interact with NCBI databases.

pdm_utils.functions.ncbi.get_accessions_to_retrieve(summary_records)

Extract accessions from summary records.

Parameters

summary_records (list) – List of dictionaries, where each dictionary is a record summary.

Returns

List of accessions.

Return type

list

pdm_utils.functions.ncbi.get_data_handle(accession_list, db='nucleotide', rettype='gb', retmode='text')
pdm_utils.functions.ncbi.get_records(accession_list, db='nucleotide', rettype='gb', retmode='text')

Retrieve records from NCBI from a list of active accessions.

Uses NCBI efetch implemented through BioPython Entrez.

Parameters
  • accession_list (list) – List of NCBI accessions.

  • db (str) – Name of the database to get summaries from (e.g. ‘nucleotide’).

  • rettype (str) – Type of record to retrieve (e.g. ‘gb’).

  • retmode (str) – Format of data to retrieve (e.g. ‘text’).

Returns

List of BioPython SeqRecords generated from GenBank records.

Return type

list

pdm_utils.functions.ncbi.get_summaries(db='', query_key='', webenv='')

Retrieve record summaries from NCBI for a list of accessions.

Uses NCBI esummary implemented through BioPython Entrez.

Parameters
  • db (str) – Name of the database to get summaries from.

  • query_key (str) – Identifier for the search. This can be directly generated from run_esearch().

  • webenv (str) – Identifier that can be directly generated from run_esearch()

Returns

List of dictionaries, where each dictionary is a record summary.

Return type

list

pdm_utils.functions.ncbi.get_verified_data_handle(acc_id_dict, ncbi_cred_dict={}, batch_size=200, file_type='gb')

Retrieve genomes from GenBank.

output_folder = Path to where files will be saved. acc_id_dict = Dictionary where key = Accession and value = List[PhageIDs]

pdm_utils.functions.ncbi.run_esearch(db='', term='', usehistory='')

Search for valid records in NCBI.

Uses NCBI esearch implemented through BioPython Entrez.

Parameters
  • db (str) – Name of the database to search.

  • term (str) – Search term.

  • usehistory (str) – Indicates if prior searches should be used.

Returns

Results of the search for each valid record.

Return type

dict

pdm_utils.functions.ncbi.set_entrez_credentials(tool=None, email=None, api_key=None)

Set BioPython Entrez credentials to improve speed and reliability.

Parameters
  • tool (str) – Name of the software/tool being used.

  • email (str) – Email contact information for NCBI.

  • api_key (str) – Unique NCBI-issued identifier to enhance retrieval speed.