GTF or the General Transfer Format is identical to GFF version2. This module was created to read and write GTF data. This module aims to be a complete implementation of the GTF specification.
- streaming parsing and streaming formatting
- creates transcript features with children_features
- only compatible with GTF
Note: For JBrowse, we generally encourage GFF3 over GTF
For GFF3, checkout @gmod/gff-js package found here
$ npm install --save @gmod/gtf
import gtf from '@gmod/gtf'
// parse a file from a file name
gtf.parseFile('path/to/my/file.gtf', { parseAll: true })
.on('data', data => {
if (data.directive) {
console.log('got a directive',data)
}
else if (data.comment) {
console.log('got a comment',data)
}
else if (data.sequence) {
console.log('got a sequence from a FASTA section')
}
else {
console.log('got a feature',data)
}
})
// parse a stream of GTF text
const fs = require('fs')
fs.createReadStream('path/to/my/file.gtf')
.pipe(gtf.parseStream())
.on('data', data => {
console.log('got item',data)
return data
})
.on('end', () => {
console.log('done parsing!')
})
// parse a string of gtf synchronously
let stringOfGTF = fs
.readFileSync('my_annotations.gtf')
.toString()
let arrayOfThings = gtf.parseStringSync(stringOfGTF)
// format an array of items to a string
let stringOfGTF = gtf.formatSync(arrayOfThings)
// format a stream of things to a stream of text.
// inserts sync marks automatically.
// note: this could create new gtf lines for transcript features
myStreamOfGTFObjects
.pipe(gtf.formatStream())
.pipe(fs.createWriteStream('my_new.gtf'))
// format a stream of things and write it to
// a gtf file. inserts sync marks
// note: this could create new gtf lines for transcript features
myStreamOfGTFObjects
.pipe(gtf.formatFile('path/to/destination.gtf')
Because GTF can not handle a 3 level hierarchy (gene -> transcript -> exon), we parse GTF by creating transcript features with children features.
We do not create features from the gene_id. Values that are .
in the GTF are
null
in the output.
ctgA bare_predicted CDS 10000 11500 . + 0 transcript_id "Apple1";
Note: that is creates an additional transcript feature from the transcript id when featureType is not 'transcript'. It will then create a child CDS feature from the line of GTF shown above.
[
[
{
"seq_name": "ctgA",
"source": "bare_predicted",
"featureType": "transcript",
"start": 10000,
"end": 11500,
"score": null,
"strand": "+",
"frame": "0",
"attributes": { "transcript_id": [ "\"Apple1\"" ] },
"child_features": [[
{
"seq_name": "ctgA",
"source": "bare_predicted",
"featureType": "CDS",
"start": 10000,
"end": 11500,
"score": null,
"strand": "+",
"frame": "0",
"attributes": { "transcript_id": [ "\"Apple1\"" ] },
"child_features": [],
"derived_features": []
}
]],
"derived_features": []
}
]
]
parseDirective("##gtf\n")
// returns
{
"directive": "gtf",
}
parseComment('# hi this is a comment\n')
// returns
{
"comment": "hi this is a comment"
}
//These come from any embedded `##FASTA` section in the GTF file.
{
"id": "ctgA",
"description": "test contig",
"sequence": "ACTGACTAGCTAGCATCAGCGTCGTAGCTATTATATTACGGTAGCCA"
}
Parse a stream of text data into a stream of feature, directive, and comment objects.
-
options
Object optional options object (optional, default{}
)options.encoding
string text encoding of the input GTF. default 'utf8'options.parseAll
boolean default false. if true, will parse all items. overrides other flagsoptions.parseFeatures
boolean default trueoptions.parseDirectives
boolean default falseoptions.parseComments
boolean default falseoptions.parseSequences
boolean default trueoptions.bufferSize
Number maximum number of GTF lines to buffer. defaults to 1000
Returns ReadableStream stream (in objectMode) of parsed items
Read and parse a GTF file from the filesystem.
-
filename
string the filename of the file to parse -
options
Object optional options objectoptions.encoding
string the file's string encoding, defaults to 'utf8'options.parseAll
boolean default false. if true, will parse all items. overrides other flagsoptions.parseFeatures
boolean default trueoptions.parseDirectives
boolean default falseoptions.parseComments
boolean default falseoptions.parseSequences
boolean default trueoptions.bufferSize
Number maximum number of GTF lines to buffer. defaults to 1000
Returns ReadableStream stream (in objectMode) of parsed items
Synchronously parse a string containing GTF and return an arrayref of the parsed items.
Returns Array array of parsed features, directives, and/or comments
Format an array of GTF items (features,directives,comments) into string of GTF. Does not insert synchronization (###) marks. Does not insert directive if it's not already there.
items
Returns String the formatted GTF
Format a stream of items (of the type produced by this script) into a stream of GTF text.
Inserts synchronization (###) marks automatically.
-
options
Object
Format a stream of items (of the type produced by this script) into a GTF file and write it to the filesystem.
Inserts synchronization (###) marks and a ##gtf directive automatically (if one is not already present).
-
stream
ReadableStream the stream to write to the file -
filename
String the file path to write to -
options
Object (optional, default{}
)
Returns Promise promise for the written filename
- util
- unescape
- _escape
- escapeColumn
- parseAttributes
- parseFeature
- parseDirective
- formatAttributes
- formatFeature
- formatDirective
- formatComment
- formatSequence
- formatItem
Unescape a string/text value used in a GTF attribute. Textual attributes should be surrounded by double quotes source info: https://mblab.wustl.edu/GTF22.html https://en.wikipedia.org/wiki/Gene_transfer_format
s
String
Returns String
Escape a value for use in a GTF attribute value.
regex
s
String
Returns String
Escape a value for use in a GTF column value.
s
String
Returns String
Parse the 9th column (attributes) of a GTF feature line.
attrString
String
Returns Object
Parse a GTF feature line.
line
String returns the parsed line in an object
Parse a GTF directive/comment line.
line
String
Returns Object the information in the directive
Format an attributes object into a string suitable for the 9th column of GTF.
attrs
Object
Format a feature object or array of feature objects into one or more lines of GTF.
featureOrFeatures
Format a directive into a line of GTF.
directive
Object
Returns String
Format a comment into a GTF comment. Yes I know this is just adding a # and a newline.
comment
Object
Returns String
Format a sequence object as FASTA
seq
Object
Returns String formatted single FASTA sequence
Format a directive, comment, or feature, or array of such items, into one or more lines of GTF.
- This is an adaptation of the JBrowse GTF parser
- GTF docs
MIT © Robert Buels