Product categorization project using an embeddings API to assign categories to products based on embeddings comparisons.
See notebook here.
The objective was to create a POC to see how this type of feature could be integrated to an app for a catalog team creating product pages.
In the context of product page creation, product categorization is one of the steps that can benefit from automation in the following ways:
- Facilitating onboarding for new team members that are not familiar with the category list
- Narrowing down the category selection by returning the most suitable categories to choose from
- Reducing frequency of UI interactions from the end user (e.g. opening a category menu, scrolling, selecting a category) so they can focus on higher-value tasks
In this project, the embeddings are product names transformed into lists of float values, known as vectors.
The closer two products' embeddings are, the closer their semantic meaning.
For this reason, embeddings can be used to group data based on their semantic similarity.
For more details, refer to Cohere's embeddings guide.
The feature works as follows:
- Generating embeddings for the products of the dataset, using product names
- Storing the product data and embeddings on Pinecone
- Generating embeddings for a new product, using the product name
- Comparing the new product's embeddings to the embeddings of the products from the dataset
- Returning the product from the dataset whose embeddings are the most similar to the new product's embeddings
- Retrieving the category from the returned product and assigning its category to the new product
Retool is the platform used to build the prototype for the product page creation app.
The prototype app works as follows:
- The user enters the new product's name in the "Product name" field
- The user clicks the "Generate category" button to trigger the category retrieval
- In the "Category" field, the retrieval returns up to 3 categories to choose from, sorted by similarity score, from highest to lowest
The two APIs used to assign a category to a new product were added and configured in Retool's Query Library:
- Cohere embeddings API
- Pinecone query API
The Query Library enables users to reference APIs so that they can be linked to events and automatically triggered when users interact with specific UI elements on a Retool app.
In this prototype, the Pinecone Query API topK
parameter is set to 10 to return a list of 10 products, sorted by similarity score, from highest to lowest.
The API response is then processed to return the top 3 categories and populate the "Category" field.
The goal is to give the end user more agency when assigning a category to a new product: by returning a choice of 3 categories, we account for potential mistake or inaccuracy, while enabling the end user to leverage their own product expertise to choose the correct category among the ones returned.
To retrieve the top 3 categories, the API response is processed as follows:
- First category: find the product with the highest similarity score, and return its category
- Second category: find the product with second highest similarity score, whose category is different from the first category returned, and return its category
- Third category: find the product with third highest similarity score, whose category is different from the first and second categories returned, and return its category
See the simplified table below for illustration: