Results and log of LLM-KG-Bench runs described in article "Assessing the Evolution of LLM capabilities for Knowledge Graph Engineering in 2023", Frey et al. to ESWC 2024
We collected data in multiple runs, each resulting in files with date and time of experiment start in filename:
- result files
(
.json,
.txt,
.yaml`, same info in different serializations) containing task, response and evaluation data - model log files (
.jsonl
) containing details on LLM interaction - full debug-log files (
.log
) containing debug log for runs
Re-evaluation is possible with LLM-KG-Bench using the --reeval parameter, thus allowing for repeated or modified evaluation (e.g. different metrics, parser libraries, scores, etc).
This dataset is published at zenodo:
Folder Q3vsQ4 contains the result files used for the comparison of Q3 and Q4 results. We evaluated whether we could replicate results from the DL4KG experiments obtained in the beginning of the third quarter 2023 (Q3) in the forth quarter of 2023 (early December - Q4). These experiments used version 1.1 of the LLM-KG bench code.
Although we see some differences, the results remain in the same interval. As such this study verifys and reinforces the original research outcomes and the soundness of the benchmark setup and tasks used in LLM-KG bench.