export

Pipeline for exporting database information into files.

pdm_utils.pipelines.export_db.append_database_version(genome_seqrecord, version_data)

Function that appends the database version to the SeqRecord comments.

Parameters
  • genome_seqrecord – Filled SeqRecord object.

  • version_data (dict) – Dictionary containing database version information.

pdm_utils.pipelines.export_db.decode_results(results, columns, verbose=False)

Function that decodes encoded results from SQLAlchemy generated data.

Parameters
  • results (list[dict]) – List of data dictionaries from a SQLAlchemy results proxy.

  • columns (list[Column]) – SQLAlchemy Column objects.

pdm_utils.pipelines.export_db.execute_csv_export(db_filter, export_path, folder_path, columns, csv_name, data_cache=None, sort=[], raw_bytes=False, verbose=False, dump=False)

Executes csv export of a MySQL database table with select columns.

Parameters
  • db_filter (Filter) – A connected and fully built Filter object.

  • export_path (Path) – Path to a dir for file creation.

  • folder_path (Path) – Path to a top-level dir.

  • table (str) – MySQL table name.

  • conditionals (list[BinaryExpression]) – MySQL WHERE clause-related SQLAlchemy objects.

  • sort (list[Column]) – A list of SQLAlchemy Columns to sort by.

  • values (list[str]) – List of values to fitler database results.

  • verbose (bool) – A boolean value to toggle progress print statements.

  • dump (bool) – A boolean value to toggle dump in current working dir.

pdm_utils.pipelines.export_db.execute_export(alchemist, pipeline, folder_path=None, folder_name='20220119_export', values=None, verbose=False, dump=False, force=False, table='phage', filters='', groups=[], sort=[], include_columns=[], exclude_columns=[], sequence_columns=False, raw_bytes=False, concatenate=False, db_name=None, phams_out=False, threads=1)

Executes the entirety of the file export pipeline.

Parameters
  • alchemist (AlchemyHandler) – A connected and fully built AlchemyHandler object.

  • pipeline (str) – File type that dictates data processing.

  • folder_path (Path) – Path to a valid dir for new dir creation.

  • folder_name (str) – A name for the export folder.

  • force (bool) – A boolean to toggle aggresive building of directories.

  • values (list[str]) – List of values to filter database results.

  • verbose (bool) – A boolean value to toggle progress print statements.

  • dump (bool) – A boolean value to toggle dump in current working dir.

  • table (str) – MySQL table name.

  • filters (str) – A list of lists with filter values, grouped by ORs.

  • groups (list[str]) – A list of supported MySQL column names to group by.

  • sort (list[str]) – A list of supported MySQL column names to sort by.

  • include_columns (list[str]) – A csv export column selection parameter.

  • exclude_columns (list[str]) – A csv export column selection parameter.

  • sequence_columns (bool) – A boolean to toggle inclusion of sequence data.

  • concatenate – A boolean to toggle concaternation for SeqRecords.

  • threads (int) – Number of processes/threads to spawn during the pipeline

pdm_utils.pipelines.export_db.execute_ffx_export(alchemist, export_path, folder_path, values, file_format, db_version, table, concatenate=False, data_cache=None, verbose=False, dump=False, threads=1, export_name=None)

Executes SeqRecord export of the compilation of data from a MySQL entry.

Parameters
  • alchemist (AlchemyHandler) – A connected and fully build AlchemyHandler object.

  • export_path (Path) – Path to a dir for file creation.

  • folder_path (Path) – Path to a top-level dir.

  • file_format (str) – Biopython supported file type.

  • db_version (dict) – Dictionary containing database version information.

  • table (str) – MySQL table name.

  • values (list[str]) – List of values to fitler database results.

  • conditionals (list[BinaryExpression]) – MySQL WHERE clause-related SQLAlchemy objects.

  • sort (list[Column]) – A list of SQLAlchemy Columns to sort by.

  • concatenate – A boolean to toggle concatenation of SeqRecords.

  • verbose (bool) – A boolean value to toggle progress print statements.

pdm_utils.pipelines.export_db.execute_sql_export(alchemist, export_path, folder_path, db_version, db_name=None, dump=False, force=False, phams_out=False, threads=1, verbose=False)
pdm_utils.pipelines.export_db.filter_csv_columns(alchemist, table, include_columns=[], exclude_columns=[], sequence_columns=False)

Function that filters and constructs a list of Columns to select.

Parameters
  • alchemist (AlchemyHandler) – A connected and fully built AlchemyHandler object.

  • table (str) – MySQL table name.

  • include_columns (list[str]) – A list of supported MySQL column names.

  • exclude_columns (list[str]) – A list of supported MySQL column names.

  • sequence_columns (bool) – A boolean to toggle inclusion of sequence data.

Returns

A list of SQLAlchemy Column objects.

Return type

list[Column]

pdm_utils.pipelines.export_db.get_cds_seqrecords(alchemist, values, data_cache=None, nucleotide=False, verbose=False, file_format=None)
pdm_utils.pipelines.export_db.get_genome_seqrecords(alchemist, values, data_cache=None, verbose=False)
pdm_utils.pipelines.export_db.get_single_genome(alchemist, phageid, get_features=False, data_cache=None)
pdm_utils.pipelines.export_db.get_sort_columns(alchemist, sort_inputs)

Function that converts input for sorting to SQLAlchemy Columns.

Parameters
  • alchemist (AlchemyHandler) – A connected and fully build AlchemyHandler object.

  • sort_inputs (list[str]) – A list of supported MySQL column names.

Returns

A list of SQLAlchemy Column objects.

Return type

list[Column]

pdm_utils.pipelines.export_db.main(unparsed_args_list)

Uses parsed args to run the entirety of the file export pipeline.

Parameters

unparsed_args_list (list[str]) – Input a list of command line args.

pdm_utils.pipelines.export_db.parse_export(unparsed_args_list)

Parses export_db arguments and stores them with an argparse object.

Parameters

unparsed_args_list (list[str]) – Input a list of command line args.

Returns

ArgParse module parsed args.

pdm_utils.pipelines.export_db.parse_feature_data(alchemist, values=[], limit=8000)

Returns Cds objects containing data parsed from a MySQL database.

Parameters
  • alchemist (AlchemyHandler) – A connected and fully built AlchemyHandler object.

  • values (list[str]) – List of GeneIDs upon which the query can be conditioned.