Skip to content

vectorize-io/sample-datasets

Repository files navigation

Vectorize Sample Datasets Repository

Overview

This repository houses a small collection of documents and datasets that you can use to try out Vectorize experiments, a platform designed to streamline the RAG (Retrieval Augmented Generation) process in generative AI applications. These datasets are intended to make it easy developers, AI engineers, and data scientists to try out Vectorize and see how it can help you find the best embedding models and chunking strategies to use with your specific data.

Directory Structure

  • arxiv-rag-papers: Contains research papers focused on Retrieval-Augmented Generation techniques, essential for developers looking to enhance their LLMs with state-of-the-art knowledge retrieval capabilities.

  • friends-episodes: Includes scripts from the "Friends" TV series, ideal for testing dialogue understanding and generation in AI systems.

  • ww2-wiki-articles: Comprises detailed articles from Wikipedia on various aspects of World War II, suitable for historical data analysis and training AI models on complex narrative understanding.

Getting Started

To start using these datasets with Vectorize:

  1. Download the files you would like to use for your vectorization experiment
  2. Follow the documentation on Vectorize to create an experiment

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published