Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use search_pool to control iterator->Next() #1008

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

alwayslove2013
Copy link
Collaborator

@alwayslove2013 alwayslove2013 commented Dec 26, 2024

issue: #997

The current AnnIterator function utilizes the search pool for concurrency control only when initializing the Iterator. Once the returned iterator is handed over to the upper layer, the ->next() calls are not subjected to any thread restrictions, which may lead to issues such as OMP conflicts.

To address this, we propose the following considerations:

  • All iterators will accept a use_knowhere_search_pool parameter during construction.
    • When set to True (the default), the iterator->next() will be scheduled by the knowhere_search_thread_pool.
    • When set to False, iterator->next() will not involve thread scheduling internally, so please take caution.
  • The initialization of iterator in the AnnIterator function will no longer be concurrent, which helps to avoid potential deadlocks.
    • Furthermore, to enhance performance for large_nq, we will try to streamline the initialization process for all iterators by deferring heavier pre-computation tasks to the first call of ->next().

@sre-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: alwayslove2013
To complete the pull request process, please assign presburger after the PR has been reviewed.
You can assign the PR to them by writing /assign @presburger in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

mergify bot commented Dec 26, 2024

@alwayslove2013 🔍 Important: PR Classification Needed!

For efficient project management and a seamless review process, it's essential to classify your PR correctly. Here's how:

  1. If you're fixing a bug, label it as kind/bug.
  2. For small tweaks (less than 20 lines without altering any functionality), please use kind/improvement.
  3. Significant changes that don't modify existing functionalities should be tagged as kind/enhancement.
  4. Adjusting APIs or changing functionality? Go with kind/feature.

For any PR outside the kind/improvement category, ensure you link to the associated issue using the format: “issue: #”.

Thanks for your efforts and contribution to the community!.

@alwayslove2013 alwayslove2013 force-pushed the iterator_next_with_search_pool branch 3 times, most recently from ee16112 to 3c70c4b Compare December 26, 2024 07:05
@@ -202,7 +203,7 @@ class IndexNode : public Object {
return GenResultDataSet(nq, std::move(range_search_result));
}

auto its_or = AnnIterator(dataset, std::move(cfg), bitset);
auto its_or = AnnIterator(dataset, std::move(cfg), bitset, false);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment about why this parameter is false

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added.

The range_search function has utilized the search_pool to concurrently handle various queries.
To prevent potential deadlocks, the iterator for a single query no longer requires additional thread control over the next() call.

// If use_knowhere_search_pool is True (the default), the iterator->Next() will be scheduled by the
// knowhere_search_thread_pool.
// If False, will Not involve thread scheduling internally, so please take caution.
template <bool use_knowhere_search_pool = true>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the justifications for adding use_knowhere_search_pool as a template parameter instead of making a regular IndexIterator::use_knowhere_search_pool field? I'm not sure I follow

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to reduce the judgment of use_knowhere_search_pool in each iterator->next() call.

distances_id.id = distances_id.id == -1 ? -1 : distances_id.id + xb_id_offset;
if (xb_id_offset != 0) {
for (auto& distances_id : distances_ids) {
distances_id.id = distances_id.id == -1 ? -1 : distances_id.id + xb_id_offset;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to confirm: this is evaluated as distances_id.id = a ? b : c, not `distances_id.id = (a ? b : c) + xb_id_offset'?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cqy123456 help check~

@alwayslove2013 alwayslove2013 force-pushed the iterator_next_with_search_pool branch from 3c70c4b to 3bcefff Compare December 27, 2024 03:41
Signed-off-by: min.tian <[email protected]>
Signed-off-by: min.tian <[email protected]>
Signed-off-by: min.tian <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants