-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: invalid memory reference #870
Comments
I tried to build this on linux x86_64 but i get a build error
|
Must be a platform-specific thing: can you comment line 151 and the definition of |
The
This seems to cause the precompiles to not be available when I try to test it:
|
Something broke with our new release yesterday. Will check and update |
Hi @azteca1998 @edg-l you should be able to repro again. I updated native to ab47832 I also hid the ram usage prints under a macos target, so that you should be able to run it on other targets fine. |
Hello, sorry for the delay. We've been working on various stuff in native which made it so that |
Hi @azteca1998 i updated the ef-test repo to commit 28caad05e4a13f935acff83ea144df0b1bc24e2e with native 5e60089. The issue still occurs |
Ok, I think I found the problem. It's the same as #816. If I run the test with lldb and compare stack pointers I get the following:
The first number is the value of If I run the test with a larger stack (1GiB in my test, probably works with way less stack) the test seems to run without problems. |
Ok, great, at the time I opeoned the issue increasing the stack size did not have any effect, which no longer seems the be the case. However, the current run is taking more than 2hours, up from 50mins before. https://github.com/kkrt-labs/kakarot-ssj/actions/runs/11947735661/job/33304216235?pr=1011 I will need to check why |
Could it be that the test that crashed was run at about the 50 minutes mark, therefore stopping all other tests, while now all tests have to be run, taking more than 2 hours? |
No, previously all 20k tests were passing in ~50mins (20mins compilation, 30mins runtime). I added some more debug info to pinpoint problem is exactly, will update you when I have results |
Some tests take 45 mins to 1.5hrs to complete a single, easy test. Do you have any idea how to debug why this is happening? I'm not sure what to look for.
That specific instance has a lot of felt252 dict operations because it's very unoptimized. I'll add some prints directly in the cairo code to see what happens in realtime, but that's a first thing we could be looking at. |
Not sure if that's the case, especially since it didn't happen earlier, but dictionaries are implemented in Rust within the runtime. When running the JIT, the runtime used is the dependency of cairo-native, which for debug builds is built using the debug profile. Could you try running the test in release mode? When running in AOT, the runtime used is the static library. If the tests are running in AOT, could you check if the linked library (the path in the env var) points to a release build? |
JIT is new right? I think i'm running in AOT because the integration in Native uses AotExecutor. I built the runtime running |
JIT is the original executor, AOT came later, but I think you're probably using AOT.
Both are fine. Internally, their interface is the same, so whatever works on one should work on the other. The sequencer is using AOT. |
There seems to be an issue with the operations performed in Here's a link to the function https://github.com/kkrt-labs/kakarot-ssj/blob/c2e4ccbfcd8d6b8d79668f34daab97080d6769f9/crates/utils/src/crypto/modexp/mpnat.cairo#L38-L106 It involves a lot of dict operations. Do you want me to add more granularity? |
Can you provide the |
Hm so actually in that specific case I checked and we're sending edit: I opened old logs of using native to find out that this test actually used to pass when using native ae17dd3 🤔 so perhaps the big number of bytes is not that much of an issue and it should pass regardless? see https://github.com/kkrt-labs/kakarot-ssj/actions/runs/11810716802/job/32903209209 from PR kkrt-labs/kakarot-ssj#1021 (where the version of native is ae17dd3): download the logs, open |
Just re-ran on my computer the stack using cairo-native ae17dd3
So that's like ~ 10 seconds for the actual test run. Compared to infinite time now 🤔 To summarize:
The test takes around 10 mins to run, and other tests also have extremely degraded performance. Let me know how I could help you more. Regarding the bytes passed to the function:
I think it's an array of size |
It seems that between the previous Cairo native version and the current one we've added proper memory management (now we deallocate stuff properly). That may be part of the performance issue. Here are the commits I suspect will have more impact in the program's performance:
If you're going to test them, please keep in mind it may start crashing again since there have been some bug fixes after those commits. I recommend testing using a test that was known to pass but still has had a performance regression. |
Would there be another way to check what's taking so long? testing each one of these commits manually sounds tedious because I have to re-adapt the sequencer impl every time |
Yes, we're working (since about an hour ago) in a tool to profile programs a bit better. I'm not sure if it'll be useful at first because we're more interested in the time taken by cairo vs the syscall handler, but it should be adaptable to this use case later on. |
Hello, I updated our branches and runner to use the starkware-libs/sequencer integration, and the release v0.2.4 of Cairo Native; this has not solved the issue unfortunately. I reverted our CI back to using the CairoVM for execution. Due to a priority shift I will no longer be trying to debug this, however, the pending work has been cleared so that we can re-start these investigations in a future date by simply running our test with the 'native' feature. |
Where are you using Cairo Native
Kakarot EF-Test suite. Running the 20k ethereum tests using blockifier + cairo native + kakarot zkevm
Version
last main
Describe the bug
To Reproduce
Clone https://github.com/kkrt-labs/ef-tests
Checkout
dev/bump-native
make
&&make setup-kakarot
Make sure
CAIRO_NATIVE_RUNTIME_LIBRARY
is a defined environment variablerun
cargo test test_static_ABAcalls0_d1g0v0_Cancun --features v1,native -- --nocapture
this should error with
(signal: 11, SIGSEGV: invalid memory reference)
The backtrace is from lldb:
lldb path/to/test
platform settings -w /path/to/ef-tests/crates/ef-testing
run
Desktop (please complete the following information):
The text was updated successfully, but these errors were encountered: