Results and log of LLM-KG-Bench runs described in article "Assessing the Evolution of LLM capabilities for Knowledge Graph Engineering in 2023", Frey et al. to ESWC 2024

We collected data in multiple runs, each resulting in files with date and time of experiment start in filename:

result files (.json, .txt, .yaml`, same info in different serializations) containing task, response and evaluation data
model log files (.jsonl) containing details on LLM interaction
full debug-log files (.log) containing debug log for runs

Re-evaluation is possible with LLM-KG-Bench using the --reeval parameter, thus allowing for repeated or modified evaluation (e.g. different metrics, parser libraries, scores, etc).

This dataset is published at zenodo:

FriendCount Graph Examples including "Tricky Case"

Replication Study

Folder Q3vsQ4 contains the result files used for the comparison of Q3 and Q4 results. We evaluated whether we could replicate results from the DL4KG experiments obtained in the beginning of the third quarter 2023 (Q3) in the forth quarter of 2023 (early December - Q4). These experiments used version 1.1 of the LLM-KG bench code.

Although we see some differences, the results remain in the same interval. As such this study verifys and reinforces the original research outcomes and the soundness of the benchmark setup and tasks used in LLM-KG bench.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2024-ESWC_LLM-Evo-2023

2024-ESWC_LLM-Evo-2023

Readme.md

Results of LLM-KG-Bench runs for ESWC 2024

FriendCount Graph Examples including "Tricky Case"

Replication Study

Name		Name	Last commit message	Last commit date
parent directory ..
Q3vsQ4		Q3vsQ4
Readme.md		Readme.md
llm-kg-bench_run-2023-12-05_08-56-15_debug-log_TurtleFixV5-GPT-20x.log		llm-kg-bench_run-2023-12-05_08-56-15_debug-log_TurtleFixV5-GPT-20x.log
llm-kg-bench_run-2023-12-05_08-56-15_modelLog_TurtleFixV5-GPT-20x.jsonl		llm-kg-bench_run-2023-12-05_08-56-15_modelLog_TurtleFixV5-GPT-20x.jsonl
llm-kg-bench_run-2023-12-05_08-56-15_result_TurtleFixV5-GPT-20x.json		llm-kg-bench_run-2023-12-05_08-56-15_result_TurtleFixV5-GPT-20x.json
llm-kg-bench_run-2023-12-05_08-56-15_result_TurtleFixV5-GPT-20x.txt		llm-kg-bench_run-2023-12-05_08-56-15_result_TurtleFixV5-GPT-20x.txt
llm-kg-bench_run-2023-12-05_08-56-15_result_TurtleFixV5-GPT-20x.yaml		llm-kg-bench_run-2023-12-05_08-56-15_result_TurtleFixV5-GPT-20x.yaml
llm-kg-bench_run-2023-12-05_10-10-53_debug-log_ConExplainxV2-GPT-20x.log		llm-kg-bench_run-2023-12-05_10-10-53_debug-log_ConExplainxV2-GPT-20x.log
llm-kg-bench_run-2023-12-05_10-10-53_modelLog_ConExplainxV2-GPT-20x.jsonl		llm-kg-bench_run-2023-12-05_10-10-53_modelLog_ConExplainxV2-GPT-20x.jsonl
llm-kg-bench_run-2023-12-05_10-10-53_result_ConExplainxV2-GPT-20x.json		llm-kg-bench_run-2023-12-05_10-10-53_result_ConExplainxV2-GPT-20x.json
llm-kg-bench_run-2023-12-05_10-10-53_result_ConExplainxV2-GPT-20x.txt		llm-kg-bench_run-2023-12-05_10-10-53_result_ConExplainxV2-GPT-20x.txt
llm-kg-bench_run-2023-12-05_10-10-53_result_ConExplainxV2-GPT-20x.yaml		llm-kg-bench_run-2023-12-05_10-10-53_result_ConExplainxV2-GPT-20x.yaml
llm-kg-bench_run-2023-12-05_10-38-19_TurtleFixV5-claude-all.json		llm-kg-bench_run-2023-12-05_10-38-19_TurtleFixV5-claude-all.json
llm-kg-bench_run-2023-12-05_10-38-19_debug-log.log		llm-kg-bench_run-2023-12-05_10-38-19_debug-log.log
llm-kg-bench_run-2023-12-05_10-38-19_modelLog.jsonl		llm-kg-bench_run-2023-12-05_10-38-19_modelLog.jsonl
llm-kg-bench_run-2023-12-05_10-38-19_result.txt		llm-kg-bench_run-2023-12-05_10-38-19_result.txt
llm-kg-bench_run-2023-12-05_10-38-19_result.yaml		llm-kg-bench_run-2023-12-05_10-38-19_result.yaml
llm-kg-bench_run-2023-12-05_10-46-07_debug-log_FactExtrV1-1-GPT-20x.log		llm-kg-bench_run-2023-12-05_10-46-07_debug-log_FactExtrV1-1-GPT-20x.log
llm-kg-bench_run-2023-12-05_10-46-07_modelLog_FactExtrV1-1-GPT-20x.jsonl		llm-kg-bench_run-2023-12-05_10-46-07_modelLog_FactExtrV1-1-GPT-20x.jsonl
llm-kg-bench_run-2023-12-05_10-46-07_result_FactExtrV1-1-GPT-20x.json		llm-kg-bench_run-2023-12-05_10-46-07_result_FactExtrV1-1-GPT-20x.json
llm-kg-bench_run-2023-12-05_10-46-07_result_FactExtrV1-1-GPT-20x.txt		llm-kg-bench_run-2023-12-05_10-46-07_result_FactExtrV1-1-GPT-20x.txt
llm-kg-bench_run-2023-12-05_10-46-07_result_FactExtrV1-1-GPT-20x.yaml		llm-kg-bench_run-2023-12-05_10-46-07_result_FactExtrV1-1-GPT-20x.yaml
llm-kg-bench_run-2023-12-05_11-05-53_debug-log.log		llm-kg-bench_run-2023-12-05_11-05-53_debug-log.log
llm-kg-bench_run-2023-12-05_11-05-53_modelLog.jsonl		llm-kg-bench_run-2023-12-05_11-05-53_modelLog.jsonl
llm-kg-bench_run-2023-12-05_11-05-53_result.txt		llm-kg-bench_run-2023-12-05_11-05-53_result.txt
llm-kg-bench_run-2023-12-05_11-05-53_result.yaml		llm-kg-bench_run-2023-12-05_11-05-53_result.yaml
llm-kg-bench_run-2023-12-05_11-05-53_result_ConExplainxV2-claude-all.json		llm-kg-bench_run-2023-12-05_11-05-53_result_ConExplainxV2-claude-all.json
llm-kg-bench_run-2023-12-05_11-12-46_debug-log.log		llm-kg-bench_run-2023-12-05_11-12-46_debug-log.log
llm-kg-bench_run-2023-12-05_11-12-46_modelLog.jsonl		llm-kg-bench_run-2023-12-05_11-12-46_modelLog.jsonl
llm-kg-bench_run-2023-12-05_11-12-46_result.txt		llm-kg-bench_run-2023-12-05_11-12-46_result.txt
llm-kg-bench_run-2023-12-05_11-12-46_result.yaml		llm-kg-bench_run-2023-12-05_11-12-46_result.yaml
llm-kg-bench_run-2023-12-05_11-12-46_result_FactExtrV1-1-claude-all.json		llm-kg-bench_run-2023-12-05_11-12-46_result_FactExtrV1-1-claude-all.json
llm-kg-bench_run-2023-12-05_12-43-02_debug-log_FriendCountV3-1-GPT-sizes-1k-to-8k-10x.log		llm-kg-bench_run-2023-12-05_12-43-02_debug-log_FriendCountV3-1-GPT-sizes-1k-to-8k-10x.log
llm-kg-bench_run-2023-12-05_12-43-02_modelLog_FriendCountV3-1-GPT-sizes-1k-to-8k-10x.jsonl		llm-kg-bench_run-2023-12-05_12-43-02_modelLog_FriendCountV3-1-GPT-sizes-1k-to-8k-10x.jsonl
llm-kg-bench_run-2023-12-05_12-43-02_result_FriendCountV3-1-GPT-sizes-1k-to-8k-10x.json		llm-kg-bench_run-2023-12-05_12-43-02_result_FriendCountV3-1-GPT-sizes-1k-to-8k-10x.json
llm-kg-bench_run-2023-12-05_12-43-02_result_FriendCountV3-1-GPT-sizes-1k-to-8k-10x.txt		llm-kg-bench_run-2023-12-05_12-43-02_result_FriendCountV3-1-GPT-sizes-1k-to-8k-10x.txt
llm-kg-bench_run-2023-12-05_12-43-02_result_FriendCountV3-1-GPT-sizes-1k-to-8k-10x.yaml		llm-kg-bench_run-2023-12-05_12-43-02_result_FriendCountV3-1-GPT-sizes-1k-to-8k-10x.yaml
llm-kg-bench_run-2023-12-05_12-53-21_debug-log.log		llm-kg-bench_run-2023-12-05_12-53-21_debug-log.log
llm-kg-bench_run-2023-12-05_12-53-21_modelLog.jsonl		llm-kg-bench_run-2023-12-05_12-53-21_modelLog.jsonl
llm-kg-bench_run-2023-12-05_12-53-21_result.txt		llm-kg-bench_run-2023-12-05_12-53-21_result.txt
llm-kg-bench_run-2023-12-05_12-53-21_result.yaml		llm-kg-bench_run-2023-12-05_12-53-21_result.yaml
llm-kg-bench_run-2023-12-05_12-53-21_result_FriendCntV3-1-claude-all.json		llm-kg-bench_run-2023-12-05_12-53-21_result_FriendCntV3-1-claude-all.json
llm-kg-bench_run-2023-12-05_14-38-27_debug-log.log		llm-kg-bench_run-2023-12-05_14-38-27_debug-log.log
llm-kg-bench_run-2023-12-05_14-38-27_modelLog.jsonl		llm-kg-bench_run-2023-12-05_14-38-27_modelLog.jsonl
llm-kg-bench_run-2023-12-05_14-38-27_result.txt		llm-kg-bench_run-2023-12-05_14-38-27_result.txt
llm-kg-bench_run-2023-12-05_14-38-27_result.yaml		llm-kg-bench_run-2023-12-05_14-38-27_result.yaml
llm-kg-bench_run-2023-12-05_14-38-27_result_GenerationV3-1-claude-all.json		llm-kg-bench_run-2023-12-05_14-38-27_result_GenerationV3-1-claude-all.json
llm-kg-bench_run-2023-12-05_16-48-35_debug-log_FriendCountV3-1-GPT-sizes-1k-to-8k-10x.log		llm-kg-bench_run-2023-12-05_16-48-35_debug-log_FriendCountV3-1-GPT-sizes-1k-to-8k-10x.log
llm-kg-bench_run-2023-12-05_16-48-35_modelLog_FriendCountV3-1-GPT-sizes-1k-to-8k-10x.jsonl		llm-kg-bench_run-2023-12-05_16-48-35_modelLog_FriendCountV3-1-GPT-sizes-1k-to-8k-10x.jsonl
llm-kg-bench_run-2023-12-05_16-48-35_result_FriendCountV3-1-GPT-sizes-1k-to-8k-10x.json		llm-kg-bench_run-2023-12-05_16-48-35_result_FriendCountV3-1-GPT-sizes-1k-to-8k-10x.json
llm-kg-bench_run-2023-12-05_16-48-35_result_FriendCountV3-1-GPT-sizes-1k-to-8k-10x.txt		llm-kg-bench_run-2023-12-05_16-48-35_result_FriendCountV3-1-GPT-sizes-1k-to-8k-10x.txt
llm-kg-bench_run-2023-12-05_16-48-35_result_FriendCountV3-1-GPT-sizes-1k-to-8k-10x.yaml		llm-kg-bench_run-2023-12-05_16-48-35_result_FriendCountV3-1-GPT-sizes-1k-to-8k-10x.yaml
llm-kg-bench_run-2023-12-05_18-08-45_debug-log_GenerationV3-2-GPT-sizes-1k-to-8k-20x.log		llm-kg-bench_run-2023-12-05_18-08-45_debug-log_GenerationV3-2-GPT-sizes-1k-to-8k-20x.log
llm-kg-bench_run-2023-12-05_18-08-45_modelLog_GenerationV3-2-GPT-sizes-1k-to-8k-20x.jsonl		llm-kg-bench_run-2023-12-05_18-08-45_modelLog_GenerationV3-2-GPT-sizes-1k-to-8k-20x.jsonl
llm-kg-bench_run-2023-12-05_18-08-45_result_GenerationV3-2-GPT-sizes-1k-to-8k-20x.json		llm-kg-bench_run-2023-12-05_18-08-45_result_GenerationV3-2-GPT-sizes-1k-to-8k-20x.json
llm-kg-bench_run-2023-12-05_18-08-45_result_GenerationV3-2-GPT-sizes-1k-to-8k-20x.txt		llm-kg-bench_run-2023-12-05_18-08-45_result_GenerationV3-2-GPT-sizes-1k-to-8k-20x.txt
llm-kg-bench_run-2023-12-05_18-08-45_result_GenerationV3-2-GPT-sizes-1k-to-8k-20x.yaml		llm-kg-bench_run-2023-12-05_18-08-45_result_GenerationV3-2-GPT-sizes-1k-to-8k-20x.yaml
llm-kg-bench_run-2023-12-07_17-47-41_debug-log_SparqlWikidataSmall-V1-5Q-Claude-10x.log		llm-kg-bench_run-2023-12-07_17-47-41_debug-log_SparqlWikidataSmall-V1-5Q-Claude-10x.log
llm-kg-bench_run-2023-12-07_17-47-41_modelLog_SparqlWikidataSmall-V1-5Q-Claude-10x.jsonl		llm-kg-bench_run-2023-12-07_17-47-41_modelLog_SparqlWikidataSmall-V1-5Q-Claude-10x.jsonl
llm-kg-bench_run-2023-12-07_17-47-41_result_SparqlWikidataSmall-V1-5Q-Claude-10x.json		llm-kg-bench_run-2023-12-07_17-47-41_result_SparqlWikidataSmall-V1-5Q-Claude-10x.json
llm-kg-bench_run-2023-12-07_17-47-41_result_SparqlWikidataSmall-V1-5Q-Claude-10x.txt		llm-kg-bench_run-2023-12-07_17-47-41_result_SparqlWikidataSmall-V1-5Q-Claude-10x.txt
llm-kg-bench_run-2023-12-07_17-47-41_result_SparqlWikidataSmall-V1-5Q-Claude-10x.yaml		llm-kg-bench_run-2023-12-07_17-47-41_result_SparqlWikidataSmall-V1-5Q-Claude-10x.yaml
llm-kg-bench_run-2023-12-07_20-08-06_debug-log_SparqlWikidataSmall-V1-5Q-Claude-10x.log		llm-kg-bench_run-2023-12-07_20-08-06_debug-log_SparqlWikidataSmall-V1-5Q-Claude-10x.log
llm-kg-bench_run-2023-12-07_20-08-06_modelLog_SparqlWikidataSmall-V1-5Q-Claude-10x.jsonl		llm-kg-bench_run-2023-12-07_20-08-06_modelLog_SparqlWikidataSmall-V1-5Q-Claude-10x.jsonl
llm-kg-bench_run-2023-12-07_20-08-06_result_SparqlWikidataSmall-V1-5Q-Claude-10x.json		llm-kg-bench_run-2023-12-07_20-08-06_result_SparqlWikidataSmall-V1-5Q-Claude-10x.json
llm-kg-bench_run-2023-12-07_20-08-06_result_SparqlWikidataSmall-V1-5Q-Claude-10x.txt		llm-kg-bench_run-2023-12-07_20-08-06_result_SparqlWikidataSmall-V1-5Q-Claude-10x.txt
llm-kg-bench_run-2023-12-07_20-08-06_result_SparqlWikidataSmall-V1-5Q-Claude-10x.yaml		llm-kg-bench_run-2023-12-07_20-08-06_result_SparqlWikidataSmall-V1-5Q-Claude-10x.yaml
llm-kg-bench_run-2023-12-07_21-31-53_debug-log_SparqlWikidataSmall-V1-5Q-GPT-20x.log		llm-kg-bench_run-2023-12-07_21-31-53_debug-log_SparqlWikidataSmall-V1-5Q-GPT-20x.log
llm-kg-bench_run-2023-12-07_21-31-53_modelLog_SparqlWikidataSmall-V1-5Q-GPT-20x.jsonl		llm-kg-bench_run-2023-12-07_21-31-53_modelLog_SparqlWikidataSmall-V1-5Q-GPT-20x.jsonl
llm-kg-bench_run-2023-12-07_21-31-53_result_SparqlWikidataSmall-V1-5Q-GPT-20x.json		llm-kg-bench_run-2023-12-07_21-31-53_result_SparqlWikidataSmall-V1-5Q-GPT-20x.json
llm-kg-bench_run-2023-12-07_21-31-53_result_SparqlWikidataSmall-V1-5Q-GPT-20x.txt		llm-kg-bench_run-2023-12-07_21-31-53_result_SparqlWikidataSmall-V1-5Q-GPT-20x.txt
llm-kg-bench_run-2023-12-07_21-31-53_result_SparqlWikidataSmall-V1-5Q-GPT-20x.yaml		llm-kg-bench_run-2023-12-07_21-31-53_result_SparqlWikidataSmall-V1-5Q-GPT-20x.yaml
llm-kg-bench_run-2023-12-08_06-23-25_debug-log_SparqlWikidataSmall-V1-5Q-GPT-Claude-5x.log		llm-kg-bench_run-2023-12-08_06-23-25_debug-log_SparqlWikidataSmall-V1-5Q-GPT-Claude-5x.log
llm-kg-bench_run-2023-12-08_06-23-25_modelLog_SparqlWikidataSmall-V1-5Q-GPT-Claude-5x.jsonl		llm-kg-bench_run-2023-12-08_06-23-25_modelLog_SparqlWikidataSmall-V1-5Q-GPT-Claude-5x.jsonl
llm-kg-bench_run-2023-12-08_06-23-25_result_SparqlWikidataSmall-V1-5Q-GPT-Claude-5x.json		llm-kg-bench_run-2023-12-08_06-23-25_result_SparqlWikidataSmall-V1-5Q-GPT-Claude-5x.json
llm-kg-bench_run-2023-12-08_06-23-25_result_SparqlWikidataSmall-V1-5Q-GPT-Claude-5x.txt		llm-kg-bench_run-2023-12-08_06-23-25_result_SparqlWikidataSmall-V1-5Q-GPT-Claude-5x.txt
llm-kg-bench_run-2023-12-08_06-23-25_result_SparqlWikidataSmall-V1-5Q-GPT-Claude-5x.yaml		llm-kg-bench_run-2023-12-08_06-23-25_result_SparqlWikidataSmall-V1-5Q-GPT-Claude-5x.yaml

Files

2024-ESWC_LLM-Evo-2023

Directory actions

More options

Directory actions

More options

Latest commit

History

2024-ESWC_LLM-Evo-2023

Folders and files

parent directory

Readme.md

Results of LLM-KG-Bench runs for ESWC 2024

FriendCount Graph Examples including "Tricky Case"

Replication Study