ncbi¶
Misc. functions to interact with NCBI databases.
- pdm_utils.functions.ncbi.get_accessions_to_retrieve(summary_records)¶
Extract accessions from summary records.
- Parameters
summary_records (list) – List of dictionaries, where each dictionary is a record summary.
- Returns
List of accessions.
- Return type
list
- pdm_utils.functions.ncbi.get_data_handle(accession_list, db='nucleotide', rettype='gb', retmode='text')¶
- pdm_utils.functions.ncbi.get_records(accession_list, db='nucleotide', rettype='gb', retmode='text')¶
Retrieve records from NCBI from a list of active accessions.
Uses NCBI efetch implemented through BioPython Entrez.
- Parameters
accession_list (list) – List of NCBI accessions.
db (str) – Name of the database to get summaries from (e.g. ‘nucleotide’).
rettype (str) – Type of record to retrieve (e.g. ‘gb’).
retmode (str) – Format of data to retrieve (e.g. ‘text’).
- Returns
List of BioPython SeqRecords generated from GenBank records.
- Return type
list
- pdm_utils.functions.ncbi.get_summaries(db='', query_key='', webenv='')¶
Retrieve record summaries from NCBI for a list of accessions.
Uses NCBI esummary implemented through BioPython Entrez.
- Parameters
db (str) – Name of the database to get summaries from.
query_key (str) – Identifier for the search. This can be directly generated from run_esearch().
webenv (str) – Identifier that can be directly generated from run_esearch()
- Returns
List of dictionaries, where each dictionary is a record summary.
- Return type
list
- pdm_utils.functions.ncbi.get_verified_data_handle(acc_id_dict, ncbi_cred_dict={}, batch_size=200, file_type='gb')¶
Retrieve genomes from GenBank.
output_folder = Path to where files will be saved. acc_id_dict = Dictionary where key = Accession and value = List[PhageIDs]
- pdm_utils.functions.ncbi.run_esearch(db='', term='', usehistory='')¶
Search for valid records in NCBI.
Uses NCBI esearch implemented through BioPython Entrez.
- Parameters
db (str) – Name of the database to search.
term (str) – Search term.
usehistory (str) – Indicates if prior searches should be used.
- Returns
Results of the search for each valid record.
- Return type
dict
- pdm_utils.functions.ncbi.set_entrez_credentials(tool=None, email=None, api_key=None)¶
Set BioPython Entrez credentials to improve speed and reliability.
- Parameters
tool (str) – Name of the software/tool being used.
email (str) – Email contact information for NCBI.
api_key (str) – Unique NCBI-issued identifier to enhance retrieval speed.