revise: revise data inconsistenciesΒΆ

This tool is to be used in combination with the pham_review pipeline to generate files formatted to suit a GenBank automated gene product annotation resubmission pipeline.

Specifically, the local revise pipeline reads in edits made to a review csv spreadsheet and searches through an indicated database to find instances of genes selected for review with product annotations dissimilar to the indicated product annotation. The revise pipeline marks these, and generates a formatted file of the changes required to correct them.

To export the resubmission csv file from a edited review file:

> python3 -m pdm_utils revise Actino_Draft local <path/to/review/file>

To run the revise pipeline in production mode, which takes into account the various flags used to denote submitted GenBank files, utilize the production flag at the command line:

>python3 -m pdm_utils revise Actino_Draft local <path/to/review/file> --production

Like many other pipelines that involve exporting data in the pdm_utils package, the range of the database entries inspected by the revise pipeline can be modified with command line filter module implementations. Filtering, grouping, and sorting can be done in the same manner as described in the export pipeline documentation:

> python3 pdm_utils revise Actino_Draft local FunctionReport.csv -w "gene.Cluster NOT IN ('A', 'B', 'K')" -g phage.Cluster -s phage.PhageID

The local revise pipeline can also translate review formatted data into update ticket tables that can be used to update the database:

>python3 pdm_utils revise Actino_Draft local FunctionReport.csv -ft ticket

The remote revise pipeline retrieves data from GenBank in five-column feature table format and searches through an indicated database to find discrepancies between the product annotations and starts in the local database and those stored at GenBank. The revise pipeline marks these, and edits the retrieved GenBank files to generate five-column feature tables with feature data consistent with the local data.

To generate the revised five-column feature table format files:

> python3 -m pdm_utils revise Actino_Draft remote

And again, the remote revise pipeline can be modified with command line filter module implementations in the same manner as described in the export pipeline documentation:

> python3 pdm_utils revise Actino_Draft remove -w "gene.Subcluster IN ('K1', 'K2', 'K6')