Skip to content

Latest commit

 

History

History

2024-ESWC_LLM-Evo-2023

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Results of LLM-KG-Bench runs for ESWC 2024

Results and log of LLM-KG-Bench runs described in article "Assessing the Evolution of LLM capabilities for Knowledge Graph Engineering in 2023", Frey et al. to ESWC 2024

We collected data in multiple runs, each resulting in files with date and time of experiment start in filename:

  • result files (.json, .txt, .yaml`, same info in different serializations) containing task, response and evaluation data
  • model log files (.jsonl) containing details on LLM interaction
  • full debug-log files (.log) containing debug log for runs

Re-evaluation is possible with LLM-KG-Bench using the --reeval parameter, thus allowing for repeated or modified evaluation (e.g. different metrics, parser libraries, scores, etc).

This dataset is published at zenodo: DOI

FriendCount Graph Examples including "Tricky Case"

FriendCountGraph-Examples drawio

Replication Study

Folder Q3vsQ4 contains the result files used for the comparison of Q3 and Q4 results. We evaluated whether we could replicate results from the DL4KG experiments obtained in the beginning of the third quarter 2023 (Q3) in the forth quarter of 2023 (early December - Q4). These experiments used version 1.1 of the LLM-KG bench code.

Although we see some differences, the results remain in the same interval. As such this study verifys and reinforces the original research outcomes and the soundness of the benchmark setup and tasks used in LLM-KG bench.

friends-f1-claude-Q3VSQ4 (1) friends-f1-gpt-Q3VSQ4 generation-MPE-claude-Q3VSQ4 (1) generation-MPE-gpt-Q3VSQ4