Skip to content

Latest commit

 

History

History
20 lines (11 loc) · 1.35 KB

README.md

File metadata and controls

20 lines (11 loc) · 1.35 KB

Vectorize Sample Datasets Repository

Overview

This repository houses a small collection of documents and datasets that you can use to try out Vectorize experiments, a platform designed to streamline the RAG (Retrieval Augmented Generation) process in generative AI applications. These datasets are intended to make it easy developers, AI engineers, and data scientists to try out Vectorize and see how it can help you find the best embedding models and chunking strategies to use with your specific data.

Directory Structure

  • arxiv-rag-papers: Contains research papers focused on Retrieval-Augmented Generation techniques, essential for developers looking to enhance their LLMs with state-of-the-art knowledge retrieval capabilities.

  • friends-episodes: Includes scripts from the "Friends" TV series, ideal for testing dialogue understanding and generation in AI systems.

  • ww2-wiki-articles: Comprises detailed articles from Wikipedia on various aspects of World War II, suitable for historical data analysis and training AI models on complex narrative understanding.

Getting Started

To start using these datasets with Vectorize:

  1. Download the files you would like to use for your vectorization experiment
  2. Follow the documentation on Vectorize to create an experiment