-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
greek confused with italic #5
Comments
We dont have enough training data on some classes, specifically 'greek', 'hebrew' and 'manuscript'. I will explain everything on the README. |
Understood. So could #7 help here? (Even with better training data, there might always be cases where the user observes systematic suboptimal detection and has a priori knowledge to throw in...)
Yes, in general we might need to use ocrd-typegroups-classifier and combine that dynamically (in the workflow) with dedicated models from other OCR processors. |
#7 would definitely help, i will look into that Its a possibility that we retrain the ocr models if we obtain more data for the lacking classes, if that is the case i will update the processor |
I have some material with alternating lines of Latin in Antiqua and Old Greek (interlinear gloss) – the perfect test case IOW.
Unfortunately, the provided model systematically detects italic (with 100% confidence) where Greek should be.
So
adaptive
will always resort to theSelOCR
result, which are wrong half of the time. And of course, when forcingCOCR
globally, because the OCR model does not have Greek trained into it, the results are not usable either.The text was updated successfully, but these errors were encountered: