-
Notifications
You must be signed in to change notification settings - Fork 894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(agents-api): Configure spacy for postgresql #1055
Conversation
CI Failure Feedback 🧐(Checks updated until commit fcd2ad3)
✨ CI feedback usage guide:The CI feedback tool (
In addition to being automatically triggered, the tool can also be invoked manually by commenting on a PR:
where |
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Explore these optional code suggestions:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Reviewed everything up to 2c25490 in 1 minute and 14 seconds
More details
- Looked at
639
lines of code in5
files - Skipped
0
files when reviewing. - Skipped posting
2
drafted comments based on config settings.
1. agents-api/agents_api/common/nlp.py:1
- Draft comment:
The removal of theKeywordMatcher
class and related functions might impact performance if keyword matching was a critical feature. Ensure this change aligns with the intended functionality. - Reason this comment was not posted:
Confidence changes required:50%
The removal of the KeywordMatcher class and related functions seems intentional, but it might affect performance if keyword matching was a critical feature. The PR description doesn't mention why it was removed, so it's worth noting.
2. agents-api/agents_api/common/nlp.py:162
- Draft comment:
Thebatch_text_to_tsvector_queries
function is commented out. If it's not needed, consider removing it to keep the code clean. If it might be needed later, add a comment explaining why it's commented out. - Reason this comment was not posted:
Confidence changes required:33%
Thebatch_text_to_tsvector_queries
function is commented out. If it's not needed, it should be removed to keep the code clean. If it might be needed later, consider adding a comment explaining why it's commented out.
Workflow ID: wflow_YkDiVPXGOGQxzNTW
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on 9eb018f in 44 seconds
More details
- Looked at
25
lines of code in1
files - Skipped
0
files when reviewing. - Skipped posting
1
drafted comments based on config settings.
1. agents-api/agents_api/common/nlp.py:45
- Draft comment:
The 'clean' parameter was removed from the 'extract_keywords' function, but the docstring was not updated to reflect this change. Consider updating the docstring to avoid confusion. - Reason this comment was not posted:
Comment looked like it was already resolved.
Workflow ID: wflow_VpSu58JemWgQgX69
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on 8fe87cb in 38 seconds
More details
- Looked at
84
lines of code in4
files - Skipped
0
files when reviewing. - Skipped posting
2
drafted comments based on config settings.
1. agents-api/agents_api/queries/docs/search_docs_by_text.py:65
- Draft comment:
Thesplit_chunks=True
parameter is consistent with the updated default behavior intext_to_tsvector_query
. Ensure this aligns with the intended query processing logic. - Reason this comment was not posted:
Confidence changes required:50%
The use ofsplit_chunks=True
in thetext_to_tsvector_query
function calls insearch_docs_by_text.py
andsearch_docs_hybrid.py
is consistent with the changes made in thenlp.py
file, where the default value forsplit_chunks
was changed toTrue
. This ensures that the function behaves as intended with the new default behavior.
2. agents-api/agents_api/queries/docs/search_docs_hybrid.py:86
- Draft comment:
Thesplit_chunks=True
parameter is consistent with the updated default behavior intext_to_tsvector_query
. Ensure this aligns with the intended query processing logic. - Reason this comment was not posted:
Confidence changes required:50%
The use ofsplit_chunks=True
in thetext_to_tsvector_query
function calls insearch_docs_by_text.py
andsearch_docs_hybrid.py
is consistent with the changes made in thenlp.py
file, where the default value forsplit_chunks
was changed toTrue
. This ensures that the function behaves as intended with the new default behavior.
Workflow ID: wflow_onSRXNoVAqdcgn4e
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on fcd2ad3 in 17 seconds
More details
- Looked at
35
lines of code in1
files - Skipped
0
files when reviewing. - Skipped posting
0
drafted comments based on config settings.
Workflow ID: wflow_kiCeiYaQ6CzTnLy5
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
PR Type
Enhancement, Tests, Bug fix
Description
Refactored and enhanced
text_to_tsvector_query
for improved keyword extraction.Removed unused and redundant methods like
find_proximity_groups
andKeywordMatcher
.Added new test cases for
text_to_tsvector_query
andextract_keywords
.Integrated
text_to_tsvector_query
intosearch_docs_by_text
andsearch_docs_hybrid
.Changes walkthrough 📝
nlp.py
Refactored and optimized NLP utilities
agents-api/agents_api/common/nlp.py
text_to_tsvector_query
to simplify and optimize keywordextraction.
find_proximity_groups
andKeywordMatcher
for betterperformance.
clean_keyword
to handle lone hyphens and multiple spaces.split_chunks
option toextract_keywords
for finer control.search_docs_by_text.py
Integrated `text_to_tsvector_query` in text search
agents-api/agents_api/queries/docs/search_docs_by_text.py
text_to_tsvector_query
for preprocessing raw text queries.search_docs_hybrid.py
Added `text_to_tsvector_query` to hybrid search
agents-api/agents_api/queries/docs/search_docs_hybrid.py
text_to_tsvector_query
for preprocessing raw text queries.test_docs_queries.py
Removed outdated `text_to_tsvector_query` tests
agents-api/tests/test_docs_queries.py
text_to_tsvector_query
.test_nlp_utilities.py
Added tests for NLP utility functions
agents-api/tests/test_nlp_utilities.py
clean_keyword
,extract_keywords
, andtext_to_tsvector_query
.split_chunks
functionality.Important
Refactor and enhance NLP utilities for improved keyword extraction and query processing, integrating changes into search functions and adding comprehensive tests.
text_to_tsvector_query
innlp.py
for improved keyword extraction and query processing.text_to_tsvector_query
intosearch_docs_by_text
andsearch_docs_hybrid
for preprocessing queries.find_proximity_groups
andKeywordMatcher
for performance improvement.clean_keyword
to handle lone hyphens and multiple spaces.split_chunks
option toextract_keywords
for finer control.clean_keyword
,extract_keywords
, andtext_to_tsvector_query
intest_nlp_utilities.py
.text_to_tsvector_query
intest_docs_queries.py
.This description was created by for fcd2ad3. It will automatically update as commits are pushed.