Train Augustus and Train Snap need more flexibility in handling annotation data #6623

wm75 · 2024-12-12T14:34:45Z

Currently, both wrappers require maker and unconditionally run the maker2zff script with default settings over the annotation gff with the aim of only retaining high-quality annotations for training.

This approach is suboptimal in several ways:

The default behavior of maker2zff is to filter features based on qi and aed attribute values if the feature states maker in its source column. When source is not maker no filtering is performed.

All of this happens behind the scene without telling the user who doesn't know that maker and non-maker gffs are treated differently.
The built-in filtering means the user cannot decide to apply less strict criteria than the default ones (unless they know about the secret workaround to disable default filtering by removing maker from the gff source column).
If default filtering results in all features getting eliminated augustus and snap training fails with hard to diagnose errors.
Augustus example: Error: training set file jwd05e/76659517/working/genome.gff3 has neither Genbank nor GFF nor FASTA format! from which it is very hard to deduce that genome.gff3 is the filtered intermediate file and that it's simply empty.

Suggestion:
Offer a separate wrapper for maker2zff with full control over settings and only suggest to filter the input gff in the downstream tools.

The text was updated successfully, but these errors were encountered:

wm75 · 2024-12-12T14:35:44Z

@abretaud @rlibouba what do you think?

abretaud · 2024-12-12T16:04:46Z

Yeah I think these training tools would need some love, I remember implementing it for use within a maker workflow, but something more agnostic would be much better.

I don't have much bandwidth at the moment to work on it, maybe Romane could help at some point. Of course if anyone proposes something I'd be happy to review!

rlibouba · 2024-12-13T11:59:53Z

Hi @wm75, that would be a good idea.
As soon as I have more time, I'll take a look.

wm75 added the enhancement label Dec 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train Augustus and Train Snap need more flexibility in handling annotation data #6623

Train Augustus and Train Snap need more flexibility in handling annotation data #6623

wm75 commented Dec 12, 2024 •

edited

Loading

wm75 commented Dec 12, 2024

abretaud commented Dec 12, 2024

rlibouba commented Dec 13, 2024

Train Augustus and Train Snap need more flexibility in handling annotation data #6623

Train Augustus and Train Snap need more flexibility in handling annotation data #6623

Comments

wm75 commented Dec 12, 2024 • edited Loading

wm75 commented Dec 12, 2024

abretaud commented Dec 12, 2024

rlibouba commented Dec 13, 2024

wm75 commented Dec 12, 2024 •

edited

Loading