Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Actions become substantially slower over time #936

Open
ianb-pomelo opened this issue Aug 22, 2024 · 25 comments
Open

Actions become substantially slower over time #936

ianb-pomelo opened this issue Aug 22, 2024 · 25 comments
Assignees

Comments

@ianb-pomelo
Copy link

I have a checkout of a git repo in sapling that I have been using for a while and as I've used it more and more, the commands (such as sl status and sl) have been substantially slower. I checked out a new version of the repo with Sapling and sl is almost 2x slower in my older checkout and sl status is ~8x slower. I suspect that it has something to do with all the history that I have locally that is hidden. Is there a way to clean up the checkout a bit to remove these stale references that I don't want to bring back? Or do I just have to routinely checkout new copies?

@bolinfest
Copy link
Contributor

I'm not sure about sl status, but in terms of interacting with Git, I just ran:

git --git-dir .sl/store/git fsck

which appeared to clean some things up?

@nicktogo
Copy link

Had same issue when upgrading to the latest version a while ago. I re-cloned with SL and it's working fine until recently, it's slower again.

@quark-zju
Copy link
Contributor

quark-zju commented Sep 18, 2024

Is your repo a .git repo or .sl repo? Which directory does the repo root has?

If it's .git/ repo, does git status make subsequential git/sl status faster?
If it's .sl/ repo, what's the output of sl debugstatus and sl debugtree? Do you have watchman installed? Does --config workingcopy.use-rust=true --config status.use-rust=true (or false) make a difference?

status should not be impacted by commit history.

@ianb-pomelo
Copy link
Author

Mine is a .git repo. I tried running git status with git --git-dir .sl/store/git status but got an error fatal: this operation must be run in a work tree.

I've attached the debug logs for sl status that has the couple second delay in checking the status

2024-09-18T14:30:50.932032Z DEBUG run: status_info: status_mode="rust"
2024-09-18T14:30:50.932036Z  INFO run:status: workingcopy::workingcopy: enter
2024-09-18T14:30:50.933227Z  INFO run:status:pending_changes: workingcopy::filesystem::watchmanfs::watchmanfs: enter
2024-09-18T14:30:50.933743Z DEBUG progress_model::registry: registering ProgressBar connecting watchman
2024-09-18T14:30:50.946475Z  INFO run:status:pending_changes: workingcopy::filesystem::watchmanfs::watchmanfs: exit
2024-09-18T14:30:50.946499Z DEBUG run:status: watchman_info: watchmanfallback=1
2024-09-18T14:30:50.946506Z  WARN run:status: workingcopy::filesystem::watchmanfs::watchmanfs: watchman error - falling back to slow crawl err=While invoking the watchman CLI to discover the server connection details: No such file or directory (os error 2), stderr=``
2024-09-18T14:30:50.946571Z  INFO run:status:pending_changes: workingcopy::filesystem::physicalfs: enter
2024-09-18T14:30:50.946586Z DEBUG run:status:pending_changes: workingcopy::filesystem::physicalfs: working copy parents: [Ok(HgId("a1b73645a2d513b0b07fbbb8105a0537a04e3f8a"))]
2024-09-18T14:30:50.946737Z DEBUG run:status:pending_changes: progress_model::registry: registering ProgressBar comparing
2024-09-18T14:30:50.946744Z  INFO run:status:pending_changes: workingcopy::filesystem::physicalfs: exit
2024-09-18T14:30:50.946749Z  INFO run:status:compute_status: workingcopy::status: enter
2024-09-18T14:30:50.946758Z  INFO run:status:compute_status:compute_status: workingcopy::status: enter
2024-09-18T14:30:50.955040Z DEBUG progress_model::registry: removing 1 orphan ProgressBar
2024-09-18T14:30:52.747772Z  INFO run:status:compute_status:compute_status:into_iter: workingcopy::filechangedetector: enter
2024-09-18T14:30:52.747795Z  INFO run:status:compute_status:compute_status:into_iter:check manifest{lookups=0}: workingcopy::filechangedetector: enter
2024-09-18T14:30:52.747802Z  INFO run:status:compute_status:compute_status:into_iter:check manifest{lookups=0}:files: manifest_tree: enter
2024-09-18T14:30:52.748272Z DEBUG tree::store::prefetch{ids="9a7be5236f176977ec136e3847dea0a38140c2ee"}: manifest_tree::store: enter
2024-09-18T14:30:52.748327Z DEBUG tree::store::prefetch{ids="9a7be5236f176977ec136e3847dea0a38140c2ee"}: manifest_tree::store: exit
2024-09-18T14:30:52.748335Z DEBUG tree::store::get{id="9a7be5236f176977ec136e3847dea0a38140c2ee"}: manifest_tree::store: enter
2024-09-18T14:30:52.748374Z  INFO run:status:compute_status:compute_status:into_iter:check manifest{lookups=0}:files: manifest_tree: exit
2024-09-18T14:30:52.748540Z DEBUG tree::store::get{id="9a7be5236f176977ec136e3847dea0a38140c2ee"}: manifest_tree::store: exit
2024-09-18T14:30:52.748652Z  INFO run:status:compute_status:compute_status:into_iter:check manifest{lookups=0}: workingcopy::filechangedetector: exit
2024-09-18T14:30:52.748848Z  INFO run:status:compute_status:compute_status:into_iter:compare contents{keys=0}: workingcopy::filechangedetector: enter
2024-09-18T14:30:52.748949Z  INFO run:status:compute_status:compute_status:into_iter:compare contents{keys=0}:get_content_stream: workingcopy::filechangedetector: enter
2024-09-18T14:30:52.749031Z  INFO run:status:compute_status:compute_status:into_iter:compare contents{keys=0}:get_content_stream: workingcopy::filechangedetector: exit
2024-09-18T14:30:52.749039Z  INFO run:status:compute_status:compute_status:into_iter:compare contents{keys=0}: workingcopy::filechangedetector: exit
2024-09-18T14:30:52.749045Z  INFO run:status:compute_status:compute_status:into_iter: workingcopy::filechangedetector: exit
2024-09-18T14:30:52.749054Z  INFO run:status:compute_status:compute_status:maybe_flush_treestate: workingcopy::util: enter
2024-09-18T14:30:52.749060Z DEBUG run:status:compute_status:compute_status:maybe_flush_treestate: workingcopy::util: pending_change_count=0 timeout_secs=Some(0)
2024-09-18T14:30:52.749065Z DEBUG run:status:compute_status:compute_status:maybe_flush_treestate: treestate::dirstate: skipping treestate flush - it is not dirty
2024-09-18T14:30:52.749068Z  INFO run:status:compute_status:compute_status:maybe_flush_treestate: workingcopy::util: exit
2024-09-18T14:30:52.750157Z  INFO run:status:compute_status:compute_status{pending_changes_count=0 manifest_files_count=0}: workingcopy::status: exit
2024-09-18T14:30:52.750163Z  INFO run:status:compute_status: workingcopy::status: exit
2024-09-18T14:30:52.750167Z  INFO run:status{status_len=0}: workingcopy::workingcopy: exit
2024-09-18T14:30:52.751429Z DEBUG run: clidispatch::context: maybe starting pager enable_pager=true reason="auto"
2024-09-18T14:30:52.755092Z DEBUG run: progress_model::registry: removing 1 orphan ProgressBar
2024-09-18T14:30:52.755551Z  INFO run: commands::run: exit

@quark-zju
Copy link
Contributor

I tried running git status with git --git-dir .sl/store/git status

That would be a .sl repo. A .git repo is when you run git init or git clone followed by running sl commands in it and is a separate issue (duplicated with #929).

Could you try the .sl instructions above?

@ianb-pomelo
Copy link
Author

Ah sorry, the output is

sl debugstatus
len(dirstate) = 7915
len(nonnormal) = 4
len(filtered nonnormal) = 4
clock = None
sl debugtree
dirstate v2 (using treestate/60f847df-7593-4f1f-b826-66c1f92973c4, offset 2811706, 7915 files tracked)

I do not have watchman installed

@quark-zju
Copy link
Contributor

quark-zju commented Sep 18, 2024

sl debugstatus
len(dirstate) = 7915
len(nonnormal) = 4
len(filtered nonnormal) = 4
clock = None

Thanks. It looks normal. Something might have regressed without our notice (non-watchman, non-edenfs use-cases are rare in production). I can have a look when I get time.

@quark-zju
Copy link
Contributor

@ianb-pomelo What is the version of Sapling you're using? Are you building sl from source? Have you tried

sl status --config workingcopy.use-rust=true --config status.use-rust=true

?

@ianb-pomelo
Copy link
Author

I installed Sapling via Homebrew and am using version Sapling 0.2.20240718-145624-f4e9df48

I tried that command and it was about the same performance as without that config.

I have also tried installing watchman and that substantially sped things up so for me I can just use that setup

@nicktogo
Copy link

nicktogo commented Sep 18, 2024

I'm using .sl now and I have watchman running. Here are my debug output:

➜  sl debugstatus                                                              
len(dirstate) = 519805
len(nonnormal) = 1
len(filtered nonnormal) = 1
clock = c:1726596135:5724:1:46787
➜  sl debugtree                                                                
dirstate v2 (using treestate/37b9b29e-9233-4a38-b688-96be70bd6b3c, offset 43057603, 519805 files tracked)

setting the flag to true is faster than false. If I just use sl status, that seems to be same as setting to true

sl status --config workingcopy.use-rust=true --config status.use-rust=true

@nicktogo
Copy link

@quark-zju do you have some suggestions how to work around this, other than re-cloning?

@destroycomputers
Copy link

I have encountered this, too.

A quick run under Instruments showed that in my case the issue was due to multi-pack index validation. The issue only appears after some time because originally there is no multi-pack index, as the count of pack files is low. This also means I could fix it temporarily by running git's garbage collection on the .sl/store/git directory.

The reason it's slow seems to be due to excessive validations on multi-pack index loading as described in libgit2/libgit2#6166 which is fixed in the upstream. Indeed, simply recompiling sapling with the git2 rust dependency bumped from version 0.14 to the latest 0.19 fixed the problem for me.

@nicktogo
Copy link

nicktogo commented Oct 3, 2024

Thanks for sharing! @destroycomputers this works for me too!!

@quark-zju
Copy link
Contributor

@quark-zju do you have some suggestions how to work around this, other than re-cloning?

As a temporary solution, you can try disabling multi-pack index by appending to .sl/store/git/config after #### End maintained by Sapling ####:

[maintenance "gc"]
enabled = true
[maintenance "incremental-repack"]
enabled = false

Then delete the multi-pack index file:

rm .sl/store/git/objects/pack/multi-pack-index

The underlying complexity is with --config git.shallow=1 (sapling config, matches git's "filter" concept, not git's "shallow"), the default gc can error out saying some object is not found (reported internally). But multi-pack repack works. If you're not using git.shallow feature you don't need the multi-pack index and can use the regular gc.

It seems the multi-pack index is causing multiple issues - performance issues reported here, and "object not found" issues reported internally ("not found" by libgit2, but found by git). So we might want to abandon its usage all together. To do that, we might need to switch off libgit2, or the use of the git native store. Those are non-trivial and would take some time...

@destroycomputers
Copy link

It seems the multi-pack index is causing multiple issues - performance issues reported here, and "object not found" issues reported internally ("not found" by libgit2, but found by git). So we might want to abandon its usage all together. To do that, we might need to switch off libgit2, or the use of the git native store. Those are non-trivial and would take some time...

@quark-zju meanwhile, would it be possible to bump the version of the git2 dependency to latest? That would at least fix this specific issue and given the currently used version is over 2 years old, maybe will have some other positive effects.

@quark-zju
Copy link
Contributor

quark-zju commented Oct 7, 2024

would it be possible to bump the version of the git2 dependency to latest?

Good catch. I didn't realize we're using an old version. I tried upgrading it but it does not seem as trivial since other 3rd party dependencies in the monorepo (like cargo, sqlite, etc) wants the older version. Upgrading them could be a choice but if it's too hard we might try different approaches.

UPDATE: It is more complex than I thought. Let's say it's not feasible to upgrade now. Our plan might be migrating off libgit2.

@yiding
Copy link
Contributor

yiding commented Dec 5, 2024

It seems the multi-pack index is causing multiple issues - performance issues reported here, and "object not found" issues reported internally ("not found" by libgit2, but found by git).

Just as a data point, I've ran into this pretty consistently with a large git repo that I've been working in where I get object not found errors after a sl pull, and disabling incremental repack and deleting the file fixes it.

@johnpanos
Copy link

Hi @quark-zju, I'm running into the same issues where I get "object not found" after a while. Disabling incremental repack fixes it, but now I'm running into performance issues.

I see that sapling is on libgit2-0.28.1. Would it be alright if I picked your brain about the update to libgit2? I would be willing to put in the work to upgrade this dependency but I need help understanding the repo structure and the internal Meta tooling.

@quark-zju
Copy link
Contributor

I see that sapling is on libgit2-0.28.1. Would it be alright if I picked your brain about the update to libgit2?

Hi @johnpanos. Thanks for the update. The complexity of upgrading is mainly because the libgit2 is also used in other projects internally in a monorepo (and shared across languages, esp. C++). Upgrading it only for Sapling but not breaking builds for other projects is tricky. It seems it's more feasible to just migrate to using git commands (ex. git cat-file --batch to read objects, but needs to be careful to not trigger remote fetches) or gitoxide.

By the way, are you using .sl mode or .git mode? For the .sl mode we hope to migrate from git's storage system to our own which will known issues like "object not found" (reported a few times on Discord), "too many file descriptors" (ex. #997), and no longer require repack to maintain performance.

@destroycomputers
Copy link

@quark-zju is it possible in your build system to compile the dependency statically? Given their (git2 crate) set up, cargo will compile libgit2 statically unless it finds a dynamic library in the system on which compilation takes place. It can also be overriden using a feature flag (vendored-libgit2) to always compile statically. That would allow to upgrade the dependency only in sapling without affecting other projects.

FWIW, the public distribution of sapling that can be found in the Releases of this repository does vendor libgit2 and isn't dependent on it dynamically.

@johnpanos
Copy link

Thanks for the quick reply @quark-zju! I also appreciate you letting me know about the monorepo setup, I see how that can be almost insurmountable.

I'm using .sl mode, which means I'm interested in this long-term vision you've mentioned. Is there anywhere I can learn more about the new planned internal storage system?

@quark-zju
Copy link
Contributor

quark-zju commented Dec 30, 2024

is it possible in your build system to compile the dependency statically?

Internally we use buck2 to build. The cargo.toml files are generated from buck2 definitions. The problem is that the buck2 definition of libgit2 is shared across languages and projects in the monorepo. So if you want to update it (regardless of using static or dynamic linking) by default it potentially break other Rust/C++/Python projects in the monorepo that directly/indirectly share the same libgit2 buck2 definition. While it might be possible to "fork" related projects for sapling's use-case, it's usually a last resort, not considered as a good practice, and in this case you'd fork C++ libgit2, Rust git2, configure them for all platforms (windows, linux, macos-arm, macos-intel) in the buck2 way, which is non-trivial as well.

Is there anywhere I can learn more about the new planned internal storage system?

You can read some docs at https://sapling-scm.com/docs/dev/internals/indexedlog and https://sapling-scm.com/docs/dev/internals/zstdelta. The indexedlog crate was published as sapling-indexedlog. We can publish the zstdelta and zstore crate too if you're interested.

The zstore is already used for tracking the metadata (i.e. references), if you want to (roughly) compare it with git you can look at the .sl/store/metalog vs a git repo created by sl debugexportmetalog /tmp/git-repo.

@destroycomputers
Copy link

@quark-zju fair enough, thanks for the reply.

@johnpanos
Copy link

@quark-zju IndexedLog is fascinating. I'm definitely going to do a deep dive on this after my $DAYJOB. I definitely have a lot of questions I'd like to ask, but I don't believe this is the appropriate forum for them.

Is there a more appropriate community discord/slack that I can ask those in instead?

@quark-zju
Copy link
Contributor

Is there a more appropriate community discord/slack that I can ask those in instead?

There is a Discord link at the bottom of https://sapling-scm.com/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants