GenBank-formatted flat files

GenBank-formatted flat files are a common data format used to represent an entire phage genome, and a detailed description of this data structure can be found at NCBI website GenBank-formatted flat file. This is a structured text file that systematically stores diverse types of information about the genome.

A flat file can be generated for any genome at any annotation stage using:

Flat file fields, such as LOCUS, DEFINITION, and REFERENCE-AUTHORS provide information regarding the entire record, while others, such as FEATURES, provide information about particular regions of the sequence in the record, such as tRNA or CDS genes. Data from flat files are stored in the phage and gene tables.

../../_images/flatfile_parsing.jpg

Summary of how a GenBank-formatted flat file is parsed