find_domains¶
- pdm_utils.pipelines.find_domains.clear_domain_data(engine)¶
Delete all domain data stored in the database.
- pdm_utils.pipelines.find_domains.construct_domain_stmt(data_dict)¶
Construct the SQL statement to insert data into the domain table.
- pdm_utils.pipelines.find_domains.construct_gene_domain_stmt(data_dict, gene_id)¶
Construct the SQL statement to insert data into the gene_domain table.
- pdm_utils.pipelines.find_domains.construct_gene_update_stmt(gene_id)¶
Construct the SQL statement to update data in the gene table.
- pdm_utils.pipelines.find_domains.construct_sql_txn(gene_id, rps_data_list)¶
Map domain data back to gene_id and create SQL statements for one transaction.
rps_data_list is a list of dictionaries, where each dictionary reflects a significat rpsblast domain hit.
- pdm_utils.pipelines.find_domains.construct_sql_txns(cds_trans_dict, rpsblast_results)¶
Construct the list of SQL transactions.
- pdm_utils.pipelines.find_domains.create_cds_translation_dict(cdd_genes)¶
Create a dictionary of genes and translations.
Returns a dictionary, where: key = unique translation value = set of GeneIDs with that translation.
- pdm_utils.pipelines.find_domains.create_results_dict(search_results)¶
Create a dictionary of search results
Input is a list of dictionaries, one dict per translation, where: keys = “Translation” and “Data”, where key = “Translation” has value = translation, key = “Data”” has value = list of rpsblast results, where Each result element is a dictionary containing domain and gene_domain data.
Returns a dictionary, where: key = unique translation, value = list of dictionaries, each dictionary a unique rpsblast result
- pdm_utils.pipelines.find_domains.execute_statement(connection, statement)¶
- pdm_utils.pipelines.find_domains.execute_transaction(connection, statement_list=[])¶
- pdm_utils.pipelines.find_domains.get_rpsblast_command()¶
Determine rpsblast+ command based on operating system.
- pdm_utils.pipelines.find_domains.get_rpsblast_path(command)¶
Determine rpsblast+ binary path.
- pdm_utils.pipelines.find_domains.insert_domain_data(engine, results)¶
Attempt to insert domain data into the database.
- pdm_utils.pipelines.find_domains.learn_cdd_name(cdd_dir)¶
- pdm_utils.pipelines.find_domains.log_gene_ids(cdd_genes)¶
Record names of the genes processed for reference.
- pdm_utils.pipelines.find_domains.main(argument_list)¶
- Parameters
argument_list –
- Returns
- pdm_utils.pipelines.find_domains.make_tempdir(tmp_dir)¶
Uses pdm_utils.functions.basic.expand_path to expand TMP_DIR; then checks whether tmpdir exists - if it doesn’t, uses os.makedirs to make it recursively. :param tmp_dir: location where I/O should take place :return:
- pdm_utils.pipelines.find_domains.process_align(align)¶
Process alignment data.
Returns description, domain_id, and name.
- pdm_utils.pipelines.find_domains.process_rps_output(filepath, evalue)¶
Process rpsblast output and return list of dictionaries.
- pdm_utils.pipelines.find_domains.search_and_process(rpsblast, cdd_name, tmp_dir, evalue, translation_id, translation)¶
Uses rpsblast to search indicated gene against the indicated CDD :param rpsblast: path to rpsblast binary :param cdd_name: CDD database path/name :param tmp_dir: path to directory where I/O will take place :param evalue: evalue cutoff for rpsblast :param translation_id: unique identifier for the translation sequence :param translation: protein sequence for gene to query :return: results
- pdm_utils.pipelines.find_domains.search_summary(rolled_back)¶
Print search results.
- pdm_utils.pipelines.find_domains.search_translations(rpsblast, cdd_name, tmp_dir, evalue, threads, engine, unique_trans, cds_trans_dict)¶
Search for conserved domains in a list of unique translations.
- pdm_utils.pipelines.find_domains.setup_argparser()¶
Builds argparse.ArgumentParser for this script :return: