Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor web retriever #1102

Merged
merged 17 commits into from
Jan 8, 2025
Merged

Refactor web retriever #1102

merged 17 commits into from
Jan 8, 2025

Conversation

Spycsh
Copy link
Member

@Spycsh Spycsh commented Jan 2, 2025

Description

Refactor web retriever

Issues

#1010

Type of change

List the type of change like below. Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds new functionality)
  • Breaking change (fix or feature that would break existing design and interface)
  • Others (enhancement, documentation, validation, etc.)

Dependencies

na

Tests

UT

@Spycsh Spycsh requested a review from lvliang-intel as a code owner January 2, 2025 09:35
Copy link
Contributor

@eero-t eero-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR could be described more in detail what it's refactoring, as commit descriptions are also missing.

This seems to be splitting retriever_chroma.py to:

  • opea_google_search.py backend, and
  • opea_web_retrievers_microservice.py frontend

I assume split is done to prepare for using other search engines than Google.

Frontend is still hard-coded to using Chroma vector-db and TEI embedding, but I assume vLLM embedding support will be addded in future.

On quick look at the new code, it seems to correspond to old one, and the text & script changes look OK (git grep does not return other matches for chroma).

PS. IMHO opea_ prefixes for the new files are redundant as this whole thing is OPEA project / repository.

@eero-t
Copy link
Contributor

eero-t commented Jan 7, 2025

CI test "web_retrievers_opea_google_search, intel_cpu":

   File "/home/user/comps/web_retrievers/src/integrations/google_search.py", line 43, in get_urls
    result = self.search.results(query, num_search_result)
             ^^^^^^^^^^^
AttributeError: 'OpeaGoogleSearch' object has no attribute 'search'

There are also other 2 trivial CI failures.

@chensuyue chensuyue merged commit 962e097 into main Jan 8, 2025
14 of 15 checks passed
@chensuyue chensuyue deleted the source/refactor_web_retriever branch January 8, 2025 07:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants