Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: optimize git log explore by paginating to minimize traversal #714

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

d-kimuson
Copy link

What does this change?

Thank you for this excellent product!

While evaluating this project for potential use, I noticed that determining the expected key was taking over 30sec. I believe this can be improved, so I'm submitting this PR with my proposed solution.

The current implementation retrieves 300 git commits at once and runs git branch --contains <hash> for each commit hash (i.e., 300 times) before checking them sequentially to find the base commit. However, in most real-world scenarios, the base commit can typically be found within the first 10-20 commits, meaning the remaining commit hash checks are largely unnecessary.

I propose implementing pagination for git log retrieval, processing 10 commits at a time. This PR doesn't modify the base commit search logic itself; instead, it simply divides the git log retrieval into batches of 10 commits and recursively fetches the next batch if the base commit isn't found.

This change should significantly improve performance in typical cases where the base commit is found within the first 10 commits, as it eliminates the need to check branch containment for the remaining 290 commits. However, in worst-case scenarios where all 300 commits need to be checked, there might be a slight performance degradation due to the pagination overhead.

References

related: #136

Screenshots

What can I check for bug fixes?

While this change isn't a bug fix, I've checked that reg-keygen-git-hash-plugin is slow by executing following script on large repository:

const GitHashKeyGenPlugin = require('reg-keygen-git-hash-plugin')
const path = require('node:path')

const main = async () => {
  const baseDir = process.cwd()
  const factory = GitHashKeyGenPlugin;
  const plugin = factory().keyGenerator;
  
  plugin.init({
    workingDirs: {
      base: baseDir,
      actualDir: path.resolve(baseDir, 'actual'),
      expectedDir: path.resolve(baseDir, 'expected'),
      diffDir: path.resolve(baseDir, 'diff'),
    },
  })
  const result = await plugin.getExpectedKey()
  
  console.log('expectedKey', result)
  console.log('actualKey', await plugin.getActualKey())
}

main()
  .then(() => {
    console.log('done')
  })
  .catch((err) => {
    console.error(err)
  })
$ time node ./try_git_hash_plugin.cjs
expectedKey d525fe0328b38e0f7150ef0f2561d71f86abcc10
actualKey 795e10298ab9923f24a67efd289ae76f2ef87a35
done
node ./try_git_hash_plugin.cjs  8.86s user 8.44s system 45% cpu 38.232 total

I've implemented this solution as a proof of concept and all tests are passing. However, I'm not entirely confident whether the code style matches the project's conventions.

I would greatly appreciate any feedback, even minor suggestions. If my implementation significantly diverges from the project's standards, I'm open to you creating a separate PR that I can use as a reference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant