KG_and_LLM A research project aimed at investigating the functioning of knowledge graphs and their combined utilization with language models.
The growing popularity of Large Language Models (LLMs) has given rise to new areas of work and research. One such area is Retrieval Augmented Generation (RAG), which allows models to connect with memory and work with new facts not covered during training.
At present, RAGs are in the early stages of development, and one of the promising foundations for RAG is knowledge graphs (KGs). Our research will delve into how we can integrate LLMs and KGs to build a prospective RAG.
Initially, our concept was to construct a RAG based on knowledge graphs for construction documentation. However, as we recognized the potential, our focus shifted towards a more research-oriented approach. We began an in-depth review of literature and scientific resources related to the construction of knowledge graphs and RAG.
Link | Name | Summary |
---|---|---|
Springer Link | BEAR on GitHub | Requires payment, but interesting examples are available on GitHub. |
Arxiv Paper | LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities | A brief overview of the tasks where LLMs currently excel and where they do not. They are good at reasoning and worse at construction. Suggests an agent-based approach to graph construction. The method is somewhat vague. |
Arxiv Paper | AutoKG: Constructing Virtual Knowledge Graphs from Unstructured Documents for Question Answering | An older work on graph construction pre-GPT on BERT. The method is interesting and worth considering. Briefly, it's a three-step process: 1. OpenEI for creating triplets, 2. Encoding with BERT, 3. Entity linking (though unclear how). |
Arxiv Paper | Text2KGBench: A Benchmark for Ontology-Driven Knowledge Graph Generation from Text | Proposes a benchmark for the task of knowledge graph generation. A very recent article with a new benchmark. |
Arxiv Paper | ITERATIVE ZERO-SHOT LLM PROMPTING FOR KNOWLEDGE GRAPH CONSTRUCTION | Proposes an iterative LLM prompting-based pipeline for automatically generating KGs without human effort. It introduces well-formed LLM prompts for each stage of the process and achieves impressive accuracy results. |
PLOS One Article | Similar to the previous article. If we decide to build graphs, this article should be examined carefully. Essentially, it proposes the same agent-based approach with good accuracy parameters for their specific task. | |
FCST Article | Chinese language article. From images, it appears to involve multi-layered prompts. | |
Arxiv Paper | Unifying Large Language Models and Knowledge Graphs: A Roadmap | An excellent overview of current developments. Dedicated subsections for each process of interest. |
Arxiv Paper | PiVe: Prompting with Iterative Verification Improving Graph-based Generative Capability of LLMs | Uses a model with a verifier module for graph construction. Briefly, they trained a small transformer on correct/incorrect responses and generate iteratively with it. |
Arxiv Paper | BertNet: Harvesting Knowledge Graphs with Arbitrary Relations from Pretrained Language Models | Constructs graphs using prompts and a trained BERT. Achieves up to 70% accuracy in some cases. |
Arxiv Paper | CodeKGC: Code Language Model for Generative Knowledge Graph Construction | |
Arxiv Paper | Joint Entity and Relation Extraction with Span Pruning and Hypergraph Neural Networks | Describes an ERE model based on Packed Levitated Marker. The code didn't work well, so it's challenging to verify the method's effectiveness. |
Arxiv Paper | Packed Levitated Marker for Entity and Relation Extraction | Describes the Packed Levitated Marker method for ERE data labeling. The code is functional, and metrics are confirmed. It's difficult to say whether it should be used in our work. |
GitHub - REBEL | REBEL: Relation Extraction By End-to-end Language generation | REBEL, a seq2seq model based on BART, performs end-to-end relation extraction for over 200 different relation types. It works excellently and is currently in use. |
After two months of studying knowledge graphs and scientific articles, we have come to the conclusion that it would be valuable to build a Retrieval Augmented Generation (RAG) model based on knowledge graphs. We plan to compare it to a conventional RAG model using vector-based approaches and potentially encapsulate this research into either a library or a scientific paper.
Among the current tasks:
Dataset Generation and Compilation for RAG Unfortunately, a sufficiently acceptable dataset for RAG has not yet been gathered, so we will need to develop it ourselves.
Metric Definition for RAG
Testing RAG in various formats, including Text-To-Cypher and Semantic Search over Graph.