phagesdb

Functions to interact with PhagesDB

pdm_utils.functions.phagesdb.construct_phage_url(phage_name)

Create URL to retrieve phage-specific data from PhagesDB.

Parameters

phage_name (str) – Name of the phage of interest.

Returns

URL pertaining to the phage.

Return type

str

pdm_utils.functions.phagesdb.create_cluster_subcluster_sets(url='https://phagesdb.org/api/clusters/')

Create sets of clusters and subclusters currently in PhagesDB.

Parameters

url (str) – A URL from which to retrieve cluster and subcluster data.

Returns

tuple (cluster_set, subcluster_set) WHERE cluster_set(set) is a set of all unique clusters on PhagesDB. subcluster_set(set) is a set of all unique subclusters on PhagesDB.

Return type

tuple

pdm_utils.functions.phagesdb.create_host_genus_set(url='https://phagesdb.org/api/host_genera/')

Create a set of host genera currently in PhagesDB.

Parameters

url (str) – A URL from which to retrieve host genus data.

Returns

All unique host genera listed on PhagesDB.

Return type

set

pdm_utils.functions.phagesdb.get_genome(phage_id, gnm_type='', seq=False)

Get genome data from PhagesDB.

Parameters
  • phage_id (str) – The name of the phage to be retrieved from PhagesDB.

  • gnm_type (str) – Identifier for the type of genome.

  • seq (bool) – Indicates whether the genome sequence should be retrieved.

Returns

A pdm_utils Genome object with the parsed data. If not genome is retrieved, None is returned.

Return type

Genome

pdm_utils.functions.phagesdb.get_phagesdb_data(url)

Retrieve all sequenced genome data from PhagesDB.

Parameters

url (str) – URL to connect to PhagesDB API.

Returns

List of dictionaries, where each dictionary contains data for each phage. If a problem is encountered during retrieval, an empty list is returned.

Return type

list

pdm_utils.functions.phagesdb.get_unphamerated_phage_list(url)

Retreive list of unphamerated phages from PhagesDB.

Parameters

url (str) – A URL from which to retrieve a list of PhagesDB genomes that are not in the most up-to-date instance of the Actino_Draft MySQL database.

Returns

List of PhageIDs.

Return type

list

pdm_utils.functions.phagesdb.parse_accession(data_dict)

Retrieve Accession from PhagesDB.

Parameters

data_dict (dict) – Dictionary of data retrieved from PhagesDB.

Returns

Accession of the phage.

Return type

str

pdm_utils.functions.phagesdb.parse_cluster(data_dict)

Retrieve Cluster from PhagesDB.

If the phage is clustered, ‘pcluster’ is a dictionary, and one key is the Cluster data (Cluster or ‘Singleton’). If for some reason no Cluster info is added at the time the genome is added to PhagesDB, ‘pcluster’ may automatically be set to NULL, which gets converted to “Unclustered” during retrieval. In the MySQL database NULL means Singleton, and the long form “Unclustered” is invalid due to its character length, so this value is converted to ‘UNK’ (‘Unknown’).

Parameters

data_dict (dict) – Dictionary of data retrieved from PhagesDB.

Returns

Cluster of the phage.

Return type

str

pdm_utils.functions.phagesdb.parse_fasta_data(fasta_data)

Parses data returned from a fasta-formatted file.

Parameters

fasta_data (str) – Data from a fasta file.

Returns

tuple (header, sequence) WHERE header(str) is the first line parsed from the parsed file. sequence(str) is the nucleotide sequence parsed from the file.

Return type

tuple

pdm_utils.functions.phagesdb.parse_fasta_filename(data_dict)

Retrieve fasta filename from PhagesDB.

Parameters

data_dict (dict) – Dictionary of data retrieved from PhagesDB.

Returns

Name of the fasta file for the phage.

Return type

str

pdm_utils.functions.phagesdb.parse_genome_data(data_dict, gnm_type='', seq=False)

Parses a dictionary of PhagesDB genome data into a pdm_utils Genome object.

Parameters
  • data_dict (dict) – Dictionary of data retrieved from PhagesDB.

  • gnm_type (str) – Identifier for the type of genome.

  • seq (bool) – Indicates whether the genome sequence should be retrieved.

Returns

A pdm_utils Genome object with the parsed data.

Return type

Genome

pdm_utils.functions.phagesdb.parse_genomes_dict(data_dict, gnm_type='', seq=False)

Returns a dictionary of pdm_utils Genome objects

Parameters
  • data_dict (dict) – Dictionary of dictionaries. Key = PhageID. Value = Dictionary of genome data retrieved from PhagesDB.

  • gnm_type (str) – Identifier for the type of genome.

  • seq (bool) – Indicates whether the genome sequence should be retrieved.

Returns

Dictionary of pdm_utils Genome object. Key = PhageID. Value = Genome object.

Return type

dict

pdm_utils.functions.phagesdb.parse_host_genus(data_dict)

Retrieve host_genus from PhagesDB.

Parameters

data_dict (dict) – Dictionary of data retrieved from PhagesDB.

Returns

Host genus of the phage.

Return type

str

pdm_utils.functions.phagesdb.parse_phage_name(data_dict)

Retrieve Phage Name from PhagesDB.

Parameters

data_dict (dict) – Dictionary of data retrieved from PhagesDB.

Returns

Name of the phage.

Return type

str

pdm_utils.functions.phagesdb.parse_subcluster(data_dict)

Retrieve Subcluster from PhagesDB.

If for some reason no cluster info is added at the time the genome is added to PhagesDB, ‘psubcluster’ may automatically be set to NULL, which gets returned as None. If the phage is a Singleton, ‘psubcluster’ is None. If the phage is clustered but not subclustered, ‘psubcluster’ is None. If the phage is clustered and subclustered, ‘psubcluster’ is a dictionary, and one key is the Subcluster data.

Parameters

data_dict (dict) – Dictionary of data retrieved from PhagesDB.

Returns

Subcluster of the phage.

Return type

str

pdm_utils.functions.phagesdb.retrieve_data_list(url)

Retrieve list of data from PhagesDB.

Parameters

url (str) – A URL from which to retrieve data.

Returns

A list of data retrieved from the URL.

Return type

list

pdm_utils.functions.phagesdb.retrieve_genome_data(phage_url)

Retrieve all data from PhagesDB for a specific phage.

Parameters

phage_url (str) – URL for data pertaining to a specific phage.

Returns

Dictionary of data parsed from the URL.

Return type

dict

pdm_utils.functions.phagesdb.retrieve_url_data(url)

Retrieve fasta file from PhagesDB.

Parameters

url (str) – URL for data to be retrieved.

Returns

Data from the URL.

Return type

str