Re-evaluate LLM approach on value correctness #15

fl4p · 2024-09-22T15:55:35Z

Results from the new benchmark comparing actual min/typ/max field values:

num *EQUAL* *VALUES*:
                                           total     
                           tabular_parse:  278 (100%)   37   35    9   31   29    1   39    7    6   25   30   29
    ocr_text2_claude-3-5-sonnet-20240620:  236 ( 85%)   30   31    6   29   21    0   34    7    6   20   26   26
                       ocr_text2_llama-3:  180 ( 65%)   24   21    8   21   22    0   26    5    5   15   16   17
                       text2_gpt-4o-mini:  157 ( 56%)   20   25    2   18   16    0   23    1    1    8   21   22
                   ocr_text2_gpt-4o-mini:  149 ( 54%)   23   21    3   22   14    0   15    5    3   10   16   17

tabular_parse is the reference, because we can assume that most of the values are correct here (no LLM, it has been carefully hand crafted).

A wrongly extracted value is much worse than a missing value, because we will not notice the mistake in the results of the power calc (missing values will output nan power values).

Analysis shows that the LLM takes values from neighbouring fields or just completely random.

The converterapi pdf2txt (or pdf2ocr2txt ?) seems to extract table contents columns wise (not row wise) , this might explain the neighbour confusion.

Random values might come from LLM exhaustion and Non deterministic effects?

The text was updated successfully, but these errors were encountered:

piotrdelikat · 2024-09-23T11:28:20Z

Could you provide an example of a datasheet name/results where this is taking place?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-evaluate LLM approach on value correctness #15

Re-evaluate LLM approach on value correctness #15

fl4p commented Sep 22, 2024

piotrdelikat commented Sep 23, 2024

Re-evaluate LLM approach on value correctness #15

Re-evaluate LLM approach on value correctness #15

Comments

fl4p commented Sep 22, 2024

piotrdelikat commented Sep 23, 2024