You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
tabular_parse is the reference, because we can assume that most of the values are correct here (no LLM, it has been carefully hand crafted).
A wrongly extracted value is much worse than a missing value, because we will not notice the mistake in the results of the power calc (missing values will output nan power values).
Analysis shows that the LLM takes values from neighbouring fields or just completely random.
The converterapi pdf2txt (or pdf2ocr2txt ?) seems to extract table contents columns wise (not row wise) , this might explain the neighbour confusion.
Random values might come from LLM exhaustion and Non deterministic effects?
The text was updated successfully, but these errors were encountered:
Results from the new benchmark comparing actual min/typ/max field values:
tabular_parse is the reference, because we can assume that most of the values are correct here (no LLM, it has been carefully hand crafted).
A wrongly extracted value is much worse than a missing value, because we will not notice the mistake in the results of the power calc (missing values will output
nan
power values).Analysis shows that the LLM takes values from neighbouring fields or just completely random.
The converterapi pdf2txt (or pdf2ocr2txt ?) seems to extract table contents columns wise (not row wise) , this might explain the neighbour confusion.
Random values might come from LLM exhaustion and Non deterministic effects?
The text was updated successfully, but these errors were encountered: