Upgrading from Vectors to Graphs: Knowledge Graph Embeddings and Graph-RAG

You can access the full project documentation at Gitbook Link!

Readme Sections:

Access the Dataset
Creating Traditional Vector Embeddings
Embeddings Visualization in 3D
Generating Knowledge Graphs
PyKeen Knowledge Graph Embedding Training
Storing Embeddings in FAISS index
Running the KG visualiser web-app
RAG_VLM

Directory Structure

.
├── 1_Traditional_Vector_Embeddings   # Traditional text and image embeddings using Word2Vec and CLIP
├── 2_Knowledge_Graphs                # Code and resources for generating Knowledge Graphs and extracting triplets
├── 3_KG_Embeddings                   # Knowledge Graph Embeddings (KGE) training using PyKeen and dimensionality reduction
├── 4_Deployment_dev                  # Scripts for deploying and testing embedding models
├── 6_FAISS_embeddings                # FAISS-based search for efficient embedding retrieval and comparisons
└── README.md                         # Project documentation

Follow the directories to get src, assets for image and text datasets

Additional Directories

📂 /src:
Core code for training Knowledge Graph Embeddings (KGE) using PyKeen, including scripts, configs, and data utilities.

📂 /assets:
Contains embedding results, visualizations, and key outputs from the models.

📑 /notebooks:
Jupyter notebooks for visualizing and comparing traditional and Knowledge Graph Embeddings (KGE).

Setup Guide and Results

1. Access the dataset

The dataset of 1k reduced COYO700M dataset can be found Here

2. Creating Traditional Vector Embeddings

4 Methods were used to create text embeddings and 1 CLIP notebook can be accessed for Image embeddings.

CLIP Embeddings
InferSent Embeddings
Universal Sentence Encoder
Bert
CLIP for Image Embeddings

Step	Description
1	Open the eg.`CLIP_Embeddings.ipynb` notebook.
2	Run all the cells to load the CLIP model and generate embeddings.
3	Follow the instructions in the notebook to input your data and obtain embeddings.

Requirements

Python 3.x
Required libraries (list them here)

How to Install

Clone the repository.

git clone https://github.com/dsgiitr/kge-clip.git
cd 1.Traditional_Vector_Embeddings

Install the required libraries using pip install -r requirements.txt.
Open the Jupyter notebooks and follow the instructions.

Tip

Refer to the Readme for more details on Traditional Vector Embeddings

3. Embeddings Visualization in 3D

To visualize text and image embeddings, use the following notebooks:

Text Embeddings Visualizer
Image Embeddings Visualizer

Each embedding and cluster will be saved in metadata.tsv.

To launch TensorBoard, use:

%tensorboard --logdir /path/to/logs/embedding

4. Generating Knowledge Graphs

Knowledge graphs foe both {text:image} pairs were generated using the following steps:

Triplet Extraction
Run the Rebel_extraction.ipynb notebook to extract triplets using the BabelScape REBEL-large model. You can find the notebook here.
Knowledge Graph Generation and Visualization
Use the KG.ipynb notebook to generate knowledge graphs and visualize them using Neo4J, NetworkX, and Plotly. Access the notebook here.

Running Neo4J Database Instance

To run a local Neo4J instance and visualize the knowledge graph:

Install Neo4J
Download and install Neo4J from the official site.
Start Neo4J
Run the following code snippet to set up a Neo4J database remotely after setting up an account.

from neo4j import GraphDatabase

# Connect to Neo4j
uri = "neo4j+s://647567ec.databases.neo4j.io"  # Replace with your Neo4j instance URI
username = "neo4j"
password = "mnx05CnETPwiMvSG7vQBZQwvJLz951fKhX-3zDfNVQg"  # Replace with your Neo4j password
driver = GraphDatabase.driver(uri, auth=(username,password))

def create_nodes_and_relationships(tx, head, type_, tail):
    query = (
        "MERGE (a:head {name: $head}) "
        "MERGE (b: tail {name: $tail}) "
        "MERGE (a)-[r : Relation {type: $type}]->(b)"
    )
    tx.run(query, head=head, type=type_, tail=tail)

#df_rebel_text=df_rebel['triplet'].tolist()
# Open a session and add data
with driver.session() as session:
    for row in triplets:
        session.write_transaction(create_nodes_and_relationships, row['head'], row['type'], row['tail'])

print("Knowledge graph created successfully!")

driver.close()

Run the following CyPhwer query on Neo4J Database instance:

MATCH (n)-[r]->(m)
RETURN n, r, m

Tip

Refer to the Readme for more detail on Knowledge Graphs

5. PyKeen Knowledge Graph Embedding Training

The PyKeen model is trained on Text and Image KG triplets extracted using Babelscape REBEL-large.

Access the text KGE notebook: pykeen_KGE_text.ipynb
Access the image KGE notebook: pykeen_KGE_Image.ipynb

PyKeen Model Configuration

from pykeen.pipeline import pipeline

result = pipeline(
    model='TransE',  # Choose a graph embedding technique
    loss="softplus",
    training=training_triples_factory,
    testing=testing_triples_factory,
    model_kwargs=dict(embedding_dim=3),  # Set embedding dimensions
    optimizer_kwargs=dict(lr=0.1),  # Set learning rate
    training_kwargs=dict(num_epochs=100, use_tqdm_batch=False),  # Set number of epochs
)

The trained KGE for both text and Image are further reduced to 3D space using PCA/UMAP & t-SNE. Result embeddings and media can be found in the assets folder here

Tip

Refer to the Readme for more detail on KG_Embeddings

6. Storing Embeddings in FAISS index

FAISS database was used to store the {text:image} Vector and Knowledge Graph embeddings for using it further with RAG-LLMs

Access the FAISS index notebook here Set the dimensions as per what the LLM model needs.

import faiss

dimension=512
index=faiss.IndexFlatL2(dimension)

index.add(embeddings_img_array) #add the img embedding in faiss
index.add(embeddings_text_array) # add text embedding in faiss

faiss.write_index(index, 'faiss_traditional_vector_embedding.index')

7. Running the KG visualiser web-app

This repository contains a Flask-based web app that supports:

Text-Based Knowledge Graph Generation
Image-Based Knowledge Graph Generation
Text & Image Vector Embedding and Knowledge Graph Embedding with TensorBoard

The app utilizes Python libraries, the REBEL model, and Graphviz for advanced graph visualization.

Follow these steps to set up and run the web app.

Prerequisites

Ensure your environment meets the following requirements:

Python 3.7 or higher
pip (Python package installer)
Graphviz for advanced graph visualization

Installation

Clone the Repository

Fork the project and clone it to your local machine:

git clone https://github.com/dsgiitr/kge-clip.git
cd kge-clip/deployment_dev

Set Up and Run the Flask App. Activate a virtual environment to manage dependencies:

On Windows:

python -m venv venv
venv\Scripts\activate

On macOS/Linux:

python3 -m venv venv
source venv/bin/activate

Install Dependencies Install the required Python packages:

pip install flask transformers torch pandas networkx matplotlib plotly graphviz

Running the Flask App Activate the Virtual Environment and start the Flask App.

On Windows:

venv\Scripts\activate
set FLASK_APP=app.py

On macOS/Linux:

source venv/bin/activate
export FLASK_APP=app.py

Run the Flask app with:

flask run

Open your web browser and navigate to http://127.0.0.1:5000/ to start using the app.

8. RAG_VLM

This module demonstrates how FAISS-based Knowledge Graph Embeddings (KGE) and Traditional Vector Embeddings (TVE) are utilized in conjunction with a Vision-Language Model (VLM) for image inference. The VLM (LLaVA) leverages CLIP embeddings for processing the test image.

CLIP Embeddings: CLIP provides a shared latent space for images and text, enabling multimodal embeddings that are used for cross-modal retrieval.
FAISS Index: Both the KGE (Knowledge Graph Embeddings) and TVE (Traditional Vector Embeddings) are stored in FAISS, facilitating fast similarity searches.
VLM (LLaVA): This model was utilized to generate text descriptions from images, and the embeddings generated by the CLIP processor are used for retrieving the most similar images from FAISS indices.

Workflow:

Image Captioning with VLM (LLaVA):
- The VLM model generated the following caption for the test image:
  - ['A young girl is smiling and showing her teeth', 'She is wearing a colorful shirt and a brown scarf'].
CLIP Embeddings Generation:
- CLIP processor was used to create image embeddings for the test image.
FAISS Index Loading:
- Loaded FAISS KGE (Knowledge Graph Embeddings) and TVE (Traditional Vector Embeddings), trained on PyKeen with REBEL triplets and image embeddings.
Similarity Search:
- A similarity search was performed on the test image embedding across both FAISS indices (KGE and TVE).

Ranking of Similar Images:

The top-ranked images were retrieved based on the highest similarity scores in both FAISS indices.

image_path = ["/content/RAG_test_image.jpeg"]
image_search_embedding = get_features_from_image_path(image_path)
distances, indices = index_tve.search(image_search_embedding.reshape(1, -1), 2)
distances = distances[0]
indices = indices[0]
indices_distances = list(zip(indices, distances))
indices_distances.sort(key=lambda x: x[1], reverse=True)

Results:
- TVE Similarity: [(73, 81.27001), (149, 77.19481)]
- KGE Similarity: [(2406, 121.6897), (163, 121.454765)]
Image Relevance:
- The retrieved images from both FAISS indices were visually compared for relevance to the original test image.
Dependency og KGE FAISS:
- More fine tuned Triplet Extraction
- PyKeen Training methods for Embedding generation
- Combining Entity and Relation Embeddings.

Results and Comparisons

Note

Detailed result and descriptions are explained in the DSG Gitbook

The results were divied into

Traditional Vector embeddings 3D Reduced visualisation using Tensorboard. 📂 Results Folder
Similarity scores of reduced embeddings of different Text encoder. 📂 Results Folder
Comparing image and text vector embeddings disparity and contextual drawbacks. 📂 Results Folder
Scene Graph Generation of {text:image} pair using VLM & Relationformer. 📂 Results Folder
KG Visualisation with Neo4j, NetworkX, Plotly and Graphviz. 📂 Results Folder
KG and traditional vector Embeddings .csv 📂 Results Folder

Core Contributors

The list of core contributors to this repository are (mentioned alphabetically):

Aastha Khaitan
Advika Sinha
Agam Pandey
Anant Jain
Simardeep Singh

Contributions 🚀

We welcome contributions to improve this project! To contribute:

Fork the repository.
Create a new branch for your feature or bug fix.
Commit your changes with clear and descriptive messages.
Push the changes to your fork and submit a pull request.

Important

Please ensure your contributions align with the project's coding standards and include relevant documentation or tests. For major changes, consider opening an issue to discuss your approach first.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Upgrading from Vectors to Graphs: Knowledge Graph Embeddings and Graph-RAG

You can access the full project documentation at Gitbook Link!

Readme Sections:

Directory Structure

Additional Directories

Setup Guide and Results

1. Access the dataset

2. Creating Traditional Vector Embeddings

Requirements

How to Install

3. Embeddings Visualization in 3D

4. Generating Knowledge Graphs

Running Neo4J Database Instance

5. PyKeen Knowledge Graph Embedding Training

PyKeen Model Configuration

6. Storing Embeddings in FAISS index

7. Running the KG visualiser web-app

8. RAG_VLM

Workflow:

Results and Comparisons

Core Contributors

Contributions 🚀

Files

README.md

Latest commit

History

README.md

File metadata and controls

Upgrading from Vectors to Graphs: Knowledge Graph Embeddings and Graph-RAG

You can access the full project documentation at Gitbook Link!

Readme Sections:

Directory Structure

Additional Directories

Setup Guide and Results

1. Access the dataset

2. Creating Traditional Vector Embeddings

Requirements

How to Install

3. Embeddings Visualization in 3D

4. Generating Knowledge Graphs

Running Neo4J Database Instance

5. PyKeen Knowledge Graph Embedding Training

PyKeen Model Configuration

6. Storing Embeddings in FAISS index

7. Running the KG visualiser web-app

8. RAG_VLM

Workflow:

Results and Comparisons

Core Contributors

Contributions 🚀