Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add constrains to Retrievers #212

Open
0xadeeb opened this issue Nov 15, 2024 · 3 comments
Open

Add constrains to Retrievers #212

0xadeeb opened this issue Nov 15, 2024 · 3 comments

Comments

@0xadeeb
Copy link

0xadeeb commented Nov 15, 2024

If I'm using a retriever (eg: HybridRetriever or VectorRetriever) is it possible for me to add constrains to any of the properties.
Example:
Node: Note
Properties: user_id, content, content_embedding, last_edited

If I need to do a similarity search on the content and return the top 5 similar content but only for nodes with specific user_id. How can I do it with the above retrievers?

@stellasia
Copy link
Contributor

Hi @0xadeeb,

For VectorRetriever, you can use pre-filtering. In that case, you will always get 5 results satisfying your constraint.

Pre-filtering does not work for hybrid search though.

In both cases, you can also consider using post-filtering, but then you're not sure you will have your 5 items. Here you need to switch to the VectorCypherRetriever (or HybridCypherRetriever) and write your retrieval query. For instance:

from neo4j_graphrag.retrievers import HybridCypherRetriever

RETRIEVAL_QUERY = " WITH node, score WHERE node.user_id = <my_user_id> RETURN node, score LIMIT 5"

retriever = HybridCypherRetriever(
    driver=driver,
    vector_index_name=INDEX_NAME,
    fulltext_index_name=FULLTEXT_INDEX_NAME,
    retrieval_query=RETRIEVAL_QUERY,
)
retriever.search(query_text=query_text, top_k=20)

(I increased top_k in the search method, and constrained limited the results to 5 in the retrieval query, which is the last part of the query, to increase the chances to get exactly 5 matches, but it totally depends on your setup)

See examples here and here.

@0xadeeb
Copy link
Author

0xadeeb commented Nov 18, 2024

Thanks for the response @stellasia. Is there any reason why pre filtering was not implemented for HybridSearch? If not I'm open to working on it.

@stellasia
Copy link
Contributor

Let me explain how it works for the vector search first.

Neither the vector nor the fulltext indexes allow prefiltering in Neo4j. So when using prefiltering, we are not using the index, in any case (at the time of writing).

To implement prefiltering in the vector case, we leverage a function directly available in Cypher: vector.similarity.cosine, and we compute the similarity "manually". To clarify, here are the Cypher queries we use:

  • Without prefiltering, using the vector index:
MATCH (node:`{node_label}`)
CALL db.index.fulltext.queryNodes($fulltext_index_name, $query_text, {limit: $top_k})
YIELD node, score
  • With prefiltering:
MATCH (node:`{node_label}`)
WHERE ${filters}
WITH node, vector.similarity.cosine(node.`{embedding_node_property}`, $query_vector) AS score
ORDER BY score DESC LIMIT $top_k

For the fulltext search, we do not have such a similarity function, so we can not use the same approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants