perf: optimize git log explore by paginating to minimize traversal #714
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this change?
Thank you for this excellent product!
While evaluating this project for potential use, I noticed that determining the expected key was taking over 30sec. I believe this can be improved, so I'm submitting this PR with my proposed solution.
The current implementation retrieves 300 git commits at once and runs
git branch --contains <hash>
for each commit hash (i.e., 300 times) before checking them sequentially to find the base commit. However, in most real-world scenarios, the base commit can typically be found within the first 10-20 commits, meaning the remaining commit hash checks are largely unnecessary.I propose implementing pagination for git log retrieval, processing 10 commits at a time. This PR doesn't modify the base commit search logic itself; instead, it simply divides the git log retrieval into batches of 10 commits and recursively fetches the next batch if the base commit isn't found.
This change should significantly improve performance in typical cases where the base commit is found within the first 10 commits, as it eliminates the need to check branch containment for the remaining 290 commits. However, in worst-case scenarios where all 300 commits need to be checked, there might be a slight performance degradation due to the pagination overhead.
References
related: #136
Screenshots
What can I check for bug fixes?
While this change isn't a bug fix, I've checked that
reg-keygen-git-hash-plugin
is slow by executing following script on large repository:I've implemented this solution as a proof of concept and all tests are passing. However, I'm not entirely confident whether the code style matches the project's conventions.
I would greatly appreciate any feedback, even minor suggestions. If my implementation significantly diverges from the project's standards, I'm open to you creating a separate PR that I can use as a reference.