Skip to content

An Approximate Vector-Analytics Benchmark for Relational Databases

License

Notifications You must be signed in to change notification settings

microsoft/VBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VBench is a benchmark for evaluating vector analytic-queries based on SQL interface. VBench uses Recipe1M dataset augmented with scalar attributes, and provides a comprehensive set of vector analytic-queries that utilize standard SQL operators, including Join, GroupBy, Filter and TopK.

In this repo, we provides instructions on

  • how to cook the VBench dataset
  • how to evaluate the vector-analytic engines on it

VBench Dataset

VBench dataset consists of two tables: Recipe Table and Tag Table.

  • Recipe Table
Column Name Data Type Example Notes
recipe_id Identifier 1 primary key
images list of String ['data/images/1/0.jpg', ...] paths of images
description Text [ingredients] + [instruction] sparse vector
images_embedding Vector [-0.0421, 0.0296, ...,0.0273] dense vector, 1024 dimensions
description_embedding Vector [0.0056,-0.0487,..., 0.0034] dense vect, 1024 dimensions
price Integer 18 price of the dish
  • Tag Table
Column Name Data Type Example Notes
id Identifier 1 primary key
tag_name Text "salad" name of the tag
tag_vector Vector [-0.0137, 0.0421,...,0.0183] embedding or weight vector, 1024 dimensions

Please refer to dataset_generation/README.md for detail insructions on how to generate these two tables.

VBench Queries

VBench has 12 queries, which can be divided into four categories:

  • Top-K
  • Vector filtering
  • Join
  • Group By The queries utilize standard SQL operators over vector and scalar columns Please refer to quereis.sql for detail.

Evaluation

Please refer to evaluation/README.md for detail insructions on how to evaluate different vector search engines.

License

The entire codebase is under MIT license.

About

An Approximate Vector-Analytics Benchmark for Relational Databases

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •