phagesdb¶
Functions to interact with PhagesDB
- pdm_utils.functions.phagesdb.construct_phage_url(phage_name)¶
Create URL to retrieve phage-specific data from PhagesDB.
- Parameters
phage_name (str) – Name of the phage of interest.
- Returns
URL pertaining to the phage.
- Return type
str
- pdm_utils.functions.phagesdb.create_cluster_subcluster_sets(url='https://phagesdb.org/api/clusters/')¶
Create sets of clusters and subclusters currently in PhagesDB.
- Parameters
url (str) – A URL from which to retrieve cluster and subcluster data.
- Returns
tuple (cluster_set, subcluster_set) WHERE cluster_set(set) is a set of all unique clusters on PhagesDB. subcluster_set(set) is a set of all unique subclusters on PhagesDB.
- Return type
tuple
- pdm_utils.functions.phagesdb.create_host_genus_set(url='https://phagesdb.org/api/host_genera/')¶
Create a set of host genera currently in PhagesDB.
- Parameters
url (str) – A URL from which to retrieve host genus data.
- Returns
All unique host genera listed on PhagesDB.
- Return type
set
- pdm_utils.functions.phagesdb.get_genome(phage_id, gnm_type='', seq=False)¶
Get genome data from PhagesDB.
- Parameters
phage_id (str) – The name of the phage to be retrieved from PhagesDB.
gnm_type (str) – Identifier for the type of genome.
seq (bool) – Indicates whether the genome sequence should be retrieved.
- Returns
A pdm_utils Genome object with the parsed data. If not genome is retrieved, None is returned.
- Return type
- pdm_utils.functions.phagesdb.get_phagesdb_data(url)¶
Retrieve all sequenced genome data from PhagesDB.
- Parameters
url (str) – URL to connect to PhagesDB API.
- Returns
List of dictionaries, where each dictionary contains data for each phage. If a problem is encountered during retrieval, an empty list is returned.
- Return type
list
- pdm_utils.functions.phagesdb.get_unphamerated_phage_list(url)¶
Retreive list of unphamerated phages from PhagesDB.
- Parameters
url (str) – A URL from which to retrieve a list of PhagesDB genomes that are not in the most up-to-date instance of the Actino_Draft MySQL database.
- Returns
List of PhageIDs.
- Return type
list
- pdm_utils.functions.phagesdb.parse_accession(data_dict)¶
Retrieve Accession from PhagesDB.
- Parameters
data_dict (dict) – Dictionary of data retrieved from PhagesDB.
- Returns
Accession of the phage.
- Return type
str
- pdm_utils.functions.phagesdb.parse_cluster(data_dict)¶
Retrieve Cluster from PhagesDB.
If the phage is clustered, ‘pcluster’ is a dictionary, and one key is the Cluster data (Cluster or ‘Singleton’). If for some reason no Cluster info is added at the time the genome is added to PhagesDB, ‘pcluster’ may automatically be set to NULL, which gets converted to “Unclustered” during retrieval. In the MySQL database NULL means Singleton, and the long form “Unclustered” is invalid due to its character length, so this value is converted to ‘UNK’ (‘Unknown’).
- Parameters
data_dict (dict) – Dictionary of data retrieved from PhagesDB.
- Returns
Cluster of the phage.
- Return type
str
- pdm_utils.functions.phagesdb.parse_fasta_data(fasta_data)¶
Parses data returned from a fasta-formatted file.
- Parameters
fasta_data (str) – Data from a fasta file.
- Returns
tuple (header, sequence) WHERE header(str) is the first line parsed from the parsed file. sequence(str) is the nucleotide sequence parsed from the file.
- Return type
tuple
- pdm_utils.functions.phagesdb.parse_fasta_filename(data_dict)¶
Retrieve fasta filename from PhagesDB.
- Parameters
data_dict (dict) – Dictionary of data retrieved from PhagesDB.
- Returns
Name of the fasta file for the phage.
- Return type
str
- pdm_utils.functions.phagesdb.parse_genome_data(data_dict, gnm_type='', seq=False)¶
Parses a dictionary of PhagesDB genome data into a pdm_utils Genome object.
- Parameters
data_dict (dict) – Dictionary of data retrieved from PhagesDB.
gnm_type (str) – Identifier for the type of genome.
seq (bool) – Indicates whether the genome sequence should be retrieved.
- Returns
A pdm_utils Genome object with the parsed data.
- Return type
- pdm_utils.functions.phagesdb.parse_genomes_dict(data_dict, gnm_type='', seq=False)¶
Returns a dictionary of pdm_utils Genome objects
- Parameters
data_dict (dict) – Dictionary of dictionaries. Key = PhageID. Value = Dictionary of genome data retrieved from PhagesDB.
gnm_type (str) – Identifier for the type of genome.
seq (bool) – Indicates whether the genome sequence should be retrieved.
- Returns
Dictionary of pdm_utils Genome object. Key = PhageID. Value = Genome object.
- Return type
dict
- pdm_utils.functions.phagesdb.parse_host_genus(data_dict)¶
Retrieve host_genus from PhagesDB.
- Parameters
data_dict (dict) – Dictionary of data retrieved from PhagesDB.
- Returns
Host genus of the phage.
- Return type
str
- pdm_utils.functions.phagesdb.parse_phage_name(data_dict)¶
Retrieve Phage Name from PhagesDB.
- Parameters
data_dict (dict) – Dictionary of data retrieved from PhagesDB.
- Returns
Name of the phage.
- Return type
str
- pdm_utils.functions.phagesdb.parse_subcluster(data_dict)¶
Retrieve Subcluster from PhagesDB.
If for some reason no cluster info is added at the time the genome is added to PhagesDB, ‘psubcluster’ may automatically be set to NULL, which gets returned as None. If the phage is a Singleton, ‘psubcluster’ is None. If the phage is clustered but not subclustered, ‘psubcluster’ is None. If the phage is clustered and subclustered, ‘psubcluster’ is a dictionary, and one key is the Subcluster data.
- Parameters
data_dict (dict) – Dictionary of data retrieved from PhagesDB.
- Returns
Subcluster of the phage.
- Return type
str
- pdm_utils.functions.phagesdb.retrieve_data_list(url)¶
Retrieve list of data from PhagesDB.
- Parameters
url (str) – A URL from which to retrieve data.
- Returns
A list of data retrieved from the URL.
- Return type
list
- pdm_utils.functions.phagesdb.retrieve_genome_data(phage_url)¶
Retrieve all data from PhagesDB for a specific phage.
- Parameters
phage_url (str) – URL for data pertaining to a specific phage.
- Returns
Dictionary of data parsed from the URL.
- Return type
dict
- pdm_utils.functions.phagesdb.retrieve_url_data(url)¶
Retrieve fasta file from PhagesDB.
- Parameters
url (str) – URL for data to be retrieved.
- Returns
Data from the URL.
- Return type
str