find_domains

pdm_utils.pipelines.find_domains.clear_domain_data(engine)

Delete all domain data stored in the database.

pdm_utils.pipelines.find_domains.construct_domain_stmt(data_dict)

Construct the SQL statement to insert data into the domain table.

pdm_utils.pipelines.find_domains.construct_gene_domain_stmt(data_dict, gene_id)

Construct the SQL statement to insert data into the gene_domain table.

pdm_utils.pipelines.find_domains.construct_gene_update_stmt(gene_id)

Construct the SQL statement to update data in the gene table.

pdm_utils.pipelines.find_domains.construct_sql_txn(gene_id, rps_data_list)

Map domain data back to gene_id and create SQL statements for one transaction.

rps_data_list is a list of dictionaries, where each dictionary reflects a significat rpsblast domain hit.

pdm_utils.pipelines.find_domains.construct_sql_txns(cds_trans_dict, rpsblast_results)

Construct the list of SQL transactions.

pdm_utils.pipelines.find_domains.create_cds_translation_dict(cdd_genes)

Create a dictionary of genes and translations.

Returns a dictionary, where: key = unique translation value = set of GeneIDs with that translation.

pdm_utils.pipelines.find_domains.create_results_dict(search_results)

Create a dictionary of search results

Input is a list of dictionaries, one dict per translation, where: keys = “Translation” and “Data”, where key = “Translation” has value = translation, key = “Data”” has value = list of rpsblast results, where Each result element is a dictionary containing domain and gene_domain data.

Returns a dictionary, where: key = unique translation, value = list of dictionaries, each dictionary a unique rpsblast result

pdm_utils.pipelines.find_domains.execute_statement(connection, statement)
pdm_utils.pipelines.find_domains.execute_transaction(connection, statement_list=[])
pdm_utils.pipelines.find_domains.get_rpsblast_command()

Determine rpsblast+ command based on operating system.

pdm_utils.pipelines.find_domains.get_rpsblast_path(command)

Determine rpsblast+ binary path.

pdm_utils.pipelines.find_domains.insert_domain_data(engine, results)

Attempt to insert domain data into the database.

pdm_utils.pipelines.find_domains.learn_cdd_name(cdd_dir)
pdm_utils.pipelines.find_domains.log_gene_ids(cdd_genes)

Record names of the genes processed for reference.

pdm_utils.pipelines.find_domains.main(argument_list)
Parameters

argument_list

Returns

pdm_utils.pipelines.find_domains.make_tempdir(tmp_dir)

Uses pdm_utils.functions.basic.expand_path to expand TMP_DIR; then checks whether tmpdir exists - if it doesn’t, uses os.makedirs to make it recursively. :param tmp_dir: location where I/O should take place :return:

pdm_utils.pipelines.find_domains.process_align(align)

Process alignment data.

Returns description, domain_id, and name.

pdm_utils.pipelines.find_domains.process_rps_output(filepath, evalue)

Process rpsblast output and return list of dictionaries.

pdm_utils.pipelines.find_domains.search_and_process(rpsblast, cdd_name, tmp_dir, evalue, translation_id, translation)

Uses rpsblast to search indicated gene against the indicated CDD :param rpsblast: path to rpsblast binary :param cdd_name: CDD database path/name :param tmp_dir: path to directory where I/O will take place :param evalue: evalue cutoff for rpsblast :param translation_id: unique identifier for the translation sequence :param translation: protein sequence for gene to query :return: results

pdm_utils.pipelines.find_domains.search_summary(rolled_back)

Print search results.

pdm_utils.pipelines.find_domains.search_translations(rpsblast, cdd_name, tmp_dir, evalue, threads, engine, unique_trans, cds_trans_dict)

Search for conserved domains in a list of unique translations.

pdm_utils.pipelines.find_domains.setup_argparser()

Builds argparse.ArgumentParser for this script :return: