Skip to content

Commit

Permalink
update relative links
Browse files Browse the repository at this point in the history
  • Loading branch information
AruneshSingh committed Apr 10, 2024
1 parent cbd5415 commit cb55571
Show file tree
Hide file tree
Showing 7 changed files with 16 additions and 16 deletions.
2 changes: 1 addition & 1 deletion docs/articles/hybrid_search_&_rerank_rag.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Retrieval-Augmented Generation (RAG) is revolutionizing traditional search engin

Hybrid search can also be paired with **semantic reranking** (to reorder outcomes) to further enhance performance. Combining hybrid search with reranking holds immense potential for various applications, including natural language processing tasks like question answering and text summarization, even for implementation at a large-scale.

In our article, we'll delve into the nuances and limitations of hybrid search and reranking. Though pure vector search is preferable, in many cases hybrid search can enhance the retrieval component in **[RAG (Retrieval Augmented Generation)](articles/retrieval-augmented-generation)**, and thereby deliver impactful and insightful text generation across various domains.
In our article, we'll delve into the nuances and limitations of hybrid search and reranking. Though pure vector search is preferable, in many cases hybrid search can enhance the retrieval component in **[RAG (Retrieval Augmented Generation)](retrieval-augmented-generation)**, and thereby deliver impactful and insightful text generation across various domains.


## What is Hybrid Search?
Expand Down
4 changes: 2 additions & 2 deletions docs/articles/retrieval_augmented_generation_eval.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

## Why evaluate RAG?

Retrieval Augmented Generation (RAG) is probably the most useful application of large language models today. RAG enhances content generation by leveraging existing information effectively. It can amalgamate specific, relevant details from multiple sources to generate more accurate and relevant query results. This makes RAG potentially invaluable in various domains, including content creation, question & answer applications, and information synthesis. RAG does this by combining the strengths of retrieval, usually using dense vector search, and text generation models, like GPT. For a more in-depth introduction to RAG, read [here](articles/retrieval-augmented-generation).
Retrieval Augmented Generation (RAG) is probably the most useful application of large language models today. RAG enhances content generation by leveraging existing information effectively. It can amalgamate specific, relevant details from multiple sources to generate more accurate and relevant query results. This makes RAG potentially invaluable in various domains, including content creation, question & answer applications, and information synthesis. RAG does this by combining the strengths of retrieval, usually using dense vector search, and text generation models, like GPT. For a more in-depth introduction to RAG, read [here](retrieval-augmented-generation).

![Implementation of RAG using Qdrant as a vector database](../assets/use_cases/retrieval_augmented_generation_eval/rag_qdrant.jpg)

Expand All @@ -16,7 +16,7 @@ In article 2, we'll look at RAGAS ([RAG Assessment](https://github.com/exploding

## Why do we need RAG?

RAG significantly enhances [vector search](building-blocks/vector-search/introduction) with the power of Large Language Models (LLM), by enabling dynamic content generation based on retrieved knowledge. RAG is indispensable when users seek to generate new content rather than interact with documents or search results directly. It excels in providing contextually rich, informative, and human-like responses. For tasks requiring detailed, coherent explanations, summaries, or responses that transcend the explicit data stored in vectors, RAG is invaluable. _Before setting up a RAG system, you should consider conducting feasibility studies to determine how and whether RAG aligns with your specific needs and value expectations._
RAG significantly enhances [vector search](../building-blocks/vector-search/introduction) with the power of Large Language Models (LLM), by enabling dynamic content generation based on retrieved knowledge. RAG is indispensable when users seek to generate new content rather than interact with documents or search results directly. It excels in providing contextually rich, informative, and human-like responses. For tasks requiring detailed, coherent explanations, summaries, or responses that transcend the explicit data stored in vectors, RAG is invaluable. _Before setting up a RAG system, you should consider conducting feasibility studies to determine how and whether RAG aligns with your specific needs and value expectations._

While vector search efficiently retrieves relevant similar documents/chunks from a document corpus, RAG permits content synthesis and a deeper level of understanding, providing essential context to queries and results generation. In this way, RAG can ensure that answers are unique and tailored to each query, in essence personalized to the user.

Expand Down
6 changes: 3 additions & 3 deletions docs/articles/scaling_rag_for_production.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,18 +8,18 @@ Our tutorial provides an example of **how you can develop a RAG pipeline with pr

The goals and requirements of development and production are usually very different. This is particularly true for new technologies like Large Language Models (LLMs) and Retrieval-augmented Generation (RAG), where organizations prioritize rapid experimentation to test the waters before committing more resources. Once important stakeholders are convinced, the focus shifts from demonstrating an application's _potential for_ creating value to _actually_ creating value, via production. Until a system is productionized, its ROI is typically zero.

**Productionizing**, in the context of [RAG systems](articles/retrieval-augmented-generation), involves transitioning from a prototype or test environment to a **stable, operational state**, in which the system is readily accessible and reliable for remote end users, such as via URL - i.e., independent of the end user machine state. Productionizing also involves **scaling** the system to handle varying levels of user demand and traffic, ensuring consistent performance and availability.
**Productionizing**, in the context of [RAG systems](retrieval-augmented-generation), involves transitioning from a prototype or test environment to a **stable, operational state**, in which the system is readily accessible and reliable for remote end users, such as via URL - i.e., independent of the end user machine state. Productionizing also involves **scaling** the system to handle varying levels of user demand and traffic, ensuring consistent performance and availability.

Even though there is no ROI without productionizing, organizations often underesimate the hurdles involved in getting to an end product. Productionizing is always a trade-off between performance and costs, and this is no different for Retrieval-augmented Generation (RAG) systems. The goal is to achieve a stable, operational, and scalable end product while keeping costs low.

Let's look more closely at the basic requirements of an [RAG system](articles/retrieval-augmented-generation), before going in to the specifics of what you'll need to productionize it in a cost-effective but scalable way.
Let's look more closely at the basic requirements of an [RAG system](retrieval-augmented-generation), before going in to the specifics of what you'll need to productionize it in a cost-effective but scalable way.

## The basics of RAG

The most basic RAG workflow looks like this:

1. Submit a text query to an embedding model, which converts it into a semantically meaningful vector embedding.
2. Send the resulting query vector embedding to your document embeddings storage location - typically a [vector database](building-blocks/vector-search/access-patterns#dynamic-access).
2. Send the resulting query vector embedding to your document embeddings storage location - typically a [vector database](../building-blocks/vector-search/access-patterns#dynamic-access).
3. Retrieve the most relevant document chunks - based on proximity of document chunk embeddings to the query vector embedding.
4. Add the retrieved document chunks as context to the query vector embedding and send it to the LLM.
5. The LLM generates a response utilizing the retrieved context.
Expand Down
2 changes: 1 addition & 1 deletion docs/articles/social_media_retrieval.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ Our implementation focuses on just the retrieval part of a RAG system. But you c
- use a rerank pattern to improve retrieval accuracy
- visualize content retrieved for a given query in a 2D plot, using UMAP

If, before continuing, you want to refamiliarize yourself with the **basics of RAG systems**, we encourage you check out this excellent article on VectorHub: [Retrieval Augmented Generation](articles/retrieval-augmented-generation).
If, before continuing, you want to refamiliarize yourself with the **basics of RAG systems**, we encourage you check out this excellent article on VectorHub: [Retrieval Augmented Generation](retrieval-augmented-generation).


## 1. System design
Expand Down
6 changes: 3 additions & 3 deletions docs/building_blocks/data_sources/introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,17 +55,17 @@ Now let's dive into the details.
### **1.1 Data Velocity**
The choice of data processing velocity is pivotal in determining the kind of data retrieval and vector compute tasks you can perform. Different velocities offer distinct advantages and make different use cases possible.

[Read more about different data velocities, here](building-blocks/data-sources/data-modality)
[Read more about different data velocities, here](data-modality)

### **1.2 Data Modality**
Whether your data is structured, unstructured, or hybrid is crucial when evaluating data sources. The nature of the data source your vector retrieval system uses shapes how that data should be managed and processed.

[Read more about how to manage different data types/modalities, here](building-blocks/data-sources/data-velocity)
[Read more about how to manage different data types/modalities, here](data-velocity)

### **1.3 Conclusions & Next Steps**
So what does this all mean?

[Read our conclusions and recommended next steps, here](building-blocks/data-sources/conclusion)
[Read our conclusions and recommended next steps, here](conclusion)

## Contributors

Expand Down
6 changes: 3 additions & 3 deletions docs/building_blocks/vector_compute/introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,15 +41,15 @@ At the core of Vector Compute are embedding models – machine learning models a

Embedding models turn features extracted from high-dimensional data, with large numbers of attributes or dimensions, like text, images, or audio, into lower-dimensional but dense mathematical representations – i.e., vectors. You can apply embedding models to structured data like tabular datasets or graphs.

[Read more about embedding models, here](building-blocks/vector-compute/embedding-models)
[Read more about embedding models, here](embedding-models)

### **2.2 Pre-Trained and Custom Models**
Your task's unique requirements will dictate when you should use a custom model, and when you should use a pre-trained model.

[Read more about how to choose the right type of model for your use case, here](building-blocks/vector-compute/pre-train-custom-models)
[Read more about how to choose the right type of model for your use case, here](pre-train-custom-models)

### **2.3 Applications of Vector Compute**
So what does this all mean? Such robust homegrown solutions will be increasingly important given the broad and ever-expanding application of Vector Compute to solve real-world problems in a spectrum of domains, [partially discussed here](building-blocks/vector-compute/applications).
So what does this all mean? Such robust homegrown solutions will be increasingly important given the broad and ever-expanding application of Vector Compute to solve real-world problems in a spectrum of domains, [partially discussed here](applications).

---
## Contributors
Expand Down
6 changes: 3 additions & 3 deletions docs/building_blocks/vector_search/introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,17 +52,17 @@ Now let's dive into the details.
### **3.1 Nearest Neighbor Search Algorithms**
Scanning to calculate the similarity between vectors quickly is at the heart of Vector Search. Vector similarity scores encoded by your embedding model/s store valuable feature or characteristic information about your data that can be used in various applications (e.g., content recommendation, clustering, data analysis). There are several ways to perform nearest neighbor search.

[Read more about different vector search algorithms, here](building-blocks/vector-search/nearest-neighbor-algorithms)
[Read more about different vector search algorithms, here](nearest-neighbor-algorithms)

### **3.2 Key Access Patterns**
The access patterns deployed in Vector Search significantly impact storage, query efficiency, and infrastructure alignment, which are consequential in optimizing your retrieval system for your intended application.

[Read more about the different access patterns, here](building-blocks/vector-search/access-patterns)
[Read more about the different access patterns, here](access-patterns)

### **3.3 Conclusions & Next Steps**
So what does this all mean?

[Read our conclusions and recommended next steps, here](building-blocks/vector-search/conclusion)
[Read our conclusions and recommended next steps, here](conclusion)

---
## Contributors
Expand Down

0 comments on commit cb55571

Please sign in to comment.