export¶
Pipeline for exporting database information into files.
- pdm_utils.pipelines.export_db.append_database_version(genome_seqrecord, version_data)¶
Function that appends the database version to the SeqRecord comments.
- Parameters
genome_seqrecord – Filled SeqRecord object.
version_data (dict) – Dictionary containing database version information.
- pdm_utils.pipelines.export_db.decode_results(results, columns, verbose=False)¶
Function that decodes encoded results from SQLAlchemy generated data.
- Parameters
results (list[dict]) – List of data dictionaries from a SQLAlchemy results proxy.
columns (list[Column]) – SQLAlchemy Column objects.
- pdm_utils.pipelines.export_db.execute_csv_export(db_filter, export_path, folder_path, columns, csv_name, data_cache=None, sort=[], raw_bytes=False, verbose=False, dump=False)¶
Executes csv export of a MySQL database table with select columns.
- Parameters
db_filter (Filter) – A connected and fully built Filter object.
export_path (Path) – Path to a dir for file creation.
folder_path (Path) – Path to a top-level dir.
table (str) – MySQL table name.
conditionals (list[BinaryExpression]) – MySQL WHERE clause-related SQLAlchemy objects.
sort (list[Column]) – A list of SQLAlchemy Columns to sort by.
values (list[str]) – List of values to fitler database results.
verbose (bool) – A boolean value to toggle progress print statements.
dump (bool) – A boolean value to toggle dump in current working dir.
- pdm_utils.pipelines.export_db.execute_export(alchemist, pipeline, folder_path=None, folder_name='20220119_export', values=None, verbose=False, dump=False, force=False, table='phage', filters='', groups=[], sort=[], include_columns=[], exclude_columns=[], sequence_columns=False, raw_bytes=False, concatenate=False, db_name=None, phams_out=False, threads=1)¶
Executes the entirety of the file export pipeline.
- Parameters
alchemist (AlchemyHandler) – A connected and fully built AlchemyHandler object.
pipeline (str) – File type that dictates data processing.
folder_path (Path) – Path to a valid dir for new dir creation.
folder_name (str) – A name for the export folder.
force (bool) – A boolean to toggle aggresive building of directories.
values (list[str]) – List of values to filter database results.
verbose (bool) – A boolean value to toggle progress print statements.
dump (bool) – A boolean value to toggle dump in current working dir.
table (str) – MySQL table name.
filters (str) – A list of lists with filter values, grouped by ORs.
groups (list[str]) – A list of supported MySQL column names to group by.
sort (list[str]) – A list of supported MySQL column names to sort by.
include_columns (list[str]) – A csv export column selection parameter.
exclude_columns (list[str]) – A csv export column selection parameter.
sequence_columns (bool) – A boolean to toggle inclusion of sequence data.
concatenate – A boolean to toggle concaternation for SeqRecords.
threads (int) – Number of processes/threads to spawn during the pipeline
- pdm_utils.pipelines.export_db.execute_ffx_export(alchemist, export_path, folder_path, values, file_format, db_version, table, concatenate=False, data_cache=None, verbose=False, dump=False, threads=1, export_name=None)¶
Executes SeqRecord export of the compilation of data from a MySQL entry.
- Parameters
alchemist (AlchemyHandler) – A connected and fully build AlchemyHandler object.
export_path (Path) – Path to a dir for file creation.
folder_path (Path) – Path to a top-level dir.
file_format (str) – Biopython supported file type.
db_version (dict) – Dictionary containing database version information.
table (str) – MySQL table name.
values (list[str]) – List of values to fitler database results.
conditionals (list[BinaryExpression]) – MySQL WHERE clause-related SQLAlchemy objects.
sort (list[Column]) – A list of SQLAlchemy Columns to sort by.
concatenate – A boolean to toggle concatenation of SeqRecords.
verbose (bool) – A boolean value to toggle progress print statements.
- pdm_utils.pipelines.export_db.execute_sql_export(alchemist, export_path, folder_path, db_version, db_name=None, dump=False, force=False, phams_out=False, threads=1, verbose=False)¶
- pdm_utils.pipelines.export_db.filter_csv_columns(alchemist, table, include_columns=[], exclude_columns=[], sequence_columns=False)¶
Function that filters and constructs a list of Columns to select.
- Parameters
alchemist (AlchemyHandler) – A connected and fully built AlchemyHandler object.
table (str) – MySQL table name.
include_columns (list[str]) – A list of supported MySQL column names.
exclude_columns (list[str]) – A list of supported MySQL column names.
sequence_columns (bool) – A boolean to toggle inclusion of sequence data.
- Returns
A list of SQLAlchemy Column objects.
- Return type
list[Column]
- pdm_utils.pipelines.export_db.get_cds_seqrecords(alchemist, values, data_cache=None, nucleotide=False, verbose=False, file_format=None)¶
- pdm_utils.pipelines.export_db.get_genome_seqrecords(alchemist, values, data_cache=None, verbose=False)¶
- pdm_utils.pipelines.export_db.get_single_genome(alchemist, phageid, get_features=False, data_cache=None)¶
- pdm_utils.pipelines.export_db.get_sort_columns(alchemist, sort_inputs)¶
Function that converts input for sorting to SQLAlchemy Columns.
- Parameters
alchemist (AlchemyHandler) – A connected and fully build AlchemyHandler object.
sort_inputs (list[str]) – A list of supported MySQL column names.
- Returns
A list of SQLAlchemy Column objects.
- Return type
list[Column]
- pdm_utils.pipelines.export_db.main(unparsed_args_list)¶
Uses parsed args to run the entirety of the file export pipeline.
- Parameters
unparsed_args_list (list[str]) – Input a list of command line args.
- pdm_utils.pipelines.export_db.parse_export(unparsed_args_list)¶
Parses export_db arguments and stores them with an argparse object.
- Parameters
unparsed_args_list (list[str]) – Input a list of command line args.
- Returns
ArgParse module parsed args.
- pdm_utils.pipelines.export_db.parse_feature_data(alchemist, values=[], limit=8000)¶
Returns Cds objects containing data parsed from a MySQL database.
- Parameters
alchemist (AlchemyHandler) – A connected and fully built AlchemyHandler object.
values (list[str]) – List of GeneIDs upon which the query can be conditioned.