bug: invalid memory reference #870

enitrat · 2024-10-20T17:05:20Z

Where are you using Cairo Native
Kakarot EF-Test suite. Running the 20k ethereum tests using blockifier + cairo native + kakarot zkevm

Version
last main

Describe the bug

* thread #2, name = 'stStaticCall::test_static_ABAcalls0_d1g0v0_Cancun', stop reason = EXC_BAD_ACCESS (code=2, address=0x16f5c6bf0)
  * frame #0: 0x00000001473975a0 2170791277312818364151319127822404238116644699931524383412296455457210549381`___lldb_unnamed_symbol13513 + 36
    frame #1: 0x000000014736798c 2170791277312818364151319127822404238116644699931524383412296455457210549381`___lldb_unnamed_symbol13487 + 13692
    frame #2: 0x0000000147331a8c 2170791277312818364151319127822404238116644699931524383412296455457210549381`___lldb_unnamed_symbol13450 + 23400
    frame #3: 0x0000000147a1f600 2170791277312818364151319127822404238116644699931524383412296455457210549381`___lldb_unnamed_symbol13635 + 40296b
...
...
    frame #103: 0x00000001472fc398 2170791277312818364151319127822404238116644699931524383412296455457210549381`___lldb_unnamed_symbol13425 + 4520
    frame #104: 0x00000001472c4248 2170791277312818364151319127822404238116644699931524383412296455457210549381`___lldb_unnamed_symbol13402 + 71760
    frame #105: 0x00000001472a284c 2170791277312818364151319127822404238116644699931524383412296455457210549381`___lldb_unnamed_symbol13382 + 14772
    frame #106: 0x0000000147291e88 2170791277312818364151319127822404238116644699931524383412296455457210549381`_mlir_ciface_f22 + 5372
    frame #107: 0x000000010076d66c stStaticCall-a00b9a1f5b0b2270`invoke_trampoline + 108
    frame #108: 0x00000001004a3dd8 stStaticCall-a00b9a1f5b0b2270`blockifier::execution::native::utils::run_native_executor::hd53e73f315696524(native_executor=0x00006000021c10f0, function_id=0x000000012c823040, call=<unavailable>, syscall_handler=<unavailable>) at utils.rs:45:28
    frame #109: 0x000000010046598c stStaticCall-a00b9a1f5b0b2270`blockifier::execution::native::entry_point_execution::execute_entry_point_call::h0820f83788dc89ed(call=CallEntryPoint @ 0x000000016ffdfc38, contract_class=NativeContractClassV1 @ 0x000000016ffdef38, state=&mut dyn blockifier::state::state_api::State @ 0x000000016ffdf6d8, resources=0x000000016fff0538, context=0x000000016ffef588) at entry_point_execution.rs:33:18
    frame #110: 0x000000010059f768 stStaticCall-a00b9a1f5b0b2270`blockifier::execution::execution_utils::execute_entry_point_call::hba155b0ecfc46347(call=CallEntryPoint @ 0x000000016ffe0938, contract_class=ContractClass @ 0x000000016ffdf7d0, state=&mut dyn blockifier::state::state_api::State @ 0x000000016ffe0158, resources=0x000000016fff0538, context=0x000000016ffef588) at execution_utils.rs:78:19
    frame #111: 0x000000010049e5d0 stStaticCall-a00b9a1f5b0b2270`blockifier::execution::entry_point::CallEntryPoint::execute::h8763f8070c7f5197(self=CallEntryPoint @ 0x000000016ffe19a0, state=&mut dyn blockifier::state::state_api::State @ 0x000000016ffe0a08, resources=0x000000016fff0538, context=0x000000016ffef588) at entry_point.rs:103:9

To Reproduce

Clone https://github.com/kkrt-labs/ef-tests
Checkout dev/bump-native
make && make setup-kakarot
Make sure CAIRO_NATIVE_RUNTIME_LIBRARY is a defined environment variable
run
cargo test test_static_ABAcalls0_d1g0v0_Cancun --features v1,native -- --nocapture
this should error with
(signal: 11, SIGSEGV: invalid memory reference)

The backtrace is from lldb:
lldb path/to/test
platform settings -w /path/to/ef-tests/crates/ef-testing
run

Desktop (please complete the following information):

OS: MacOS,
Architecture arm64

The text was updated successfully, but these errors were encountered:

edg-l · 2024-10-21T07:43:44Z

I tried to build this on linux x86_64 but i get a build error

error[E0412]: cannot find type `integer_t` in crate `libc`
   --> crates/ef-testing/src/models/result.rs:120:36
    |
120 |             / mem::size_of::<libc::integer_t>() as libc::c_uint;
    |                                    ^^^^^^^^^ help: a type alias with a similar name exists: `intptr_t`
    |
   ::: /home/edgar/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libc-0.2.159/src/unix/mod.rs:21:1
    |
21  | pub type intptr_t = isize;
    | ----------------- similarly named type alias `intptr_t` defined here

warning: unused imports: `ACCOUNT_CONTRACT_CLASS`, `KAKAROT_CLASS`, and `UNINITIALIZED_ACCOUNT_CLASS`
  --> crates/ef-testing/src/evm_sequencer/sequencer/mod.rs:12:9
   |
12 |         ACCOUNT_CONTRACT_CLASS, ACCOUNT_CONTRACT_CLASS_HASH, BLOCK_GAS_LIMIT,
   |         ^^^^^^^^^^^^^^^^^^^^^^
13 |         ETH_FEE_TOKEN_ADDRESS, FEE_TOKEN_CLASS, FEE_TOKEN_CLASS_HASH, KAKAROT_ADDRESS,
14 |         KAKAROT_CLASS, KAKAROT_CLASS_HASH, KAKAROT_OWNER_ADDRESS, OPENZEPPELIN_ACCOUNT_CLASS,
   |         ^^^^^^^^^^^^^
15 |         OPENZEPPELIN_ACCOUNT_CLASS_HASH, RELAYER_ADDRESS, RELAYER_BALANCE, RELAYER_VERIFYING_KEY,
16 |         STRK_FEE_TOKEN_ADDRESS, UNINITIALIZED_ACCOUNT_CLASS, UNINITIALIZED_ACCOUNT_CLASS_HASH,
   |                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
   = note: `#[warn(unused_imports)]` on by default

warning: unused import: `std::ptr`
  --> crates/ef-testing/src/models/result.rs:82:5
   |
82 | use std::ptr;
   |     ^^^^^^^^

For more information about this error, try `rustc --explain E0412`.
warning: `ef-testing` (lib) generated 2 warnings
error: could not compile `ef-testing` (lib) due to 1 previous error; 2 warnings emitted
warning: build failed, waiting for other jobs to finish...

enitrat · 2024-10-21T08:19:06Z

I tried to build this on linux x86_64 but i get a build error

Must be a platform-specific thing:

can you comment line 151 and the definition of foo line 108? I used this to get memory usage logs after each test to identify families that caused memory leaks.

azteca1998 · 2024-10-24T17:54:40Z

The make steup-kakarot step is failing for me. Is this expected?

ef-tests$ make setup-kakarot
rm -rf build/common
mkdir -p build/common
rm -rf build/v0
mkdir -p build/v0
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  8651  100  8651    0     0  25596      0 --:--:-- --:--:-- --:--:-- 25670
unzip -o kakarot-build.zip -d build/v0
Archive:  kakarot-build.zip
  inflating: build/v0/contracts_MockContractUpgradeableV0.compiled_contract_class.json
  inflating: build/v0/contracts_UninitializedAccount.contract_class.json
  inflating: build/v0/contracts_AccountContract.contract_class.json
  inflating: build/v0/contracts_UninitializedAccount.compiled_contract_class.json
  inflating: build/v0/contracts.starknet_artifacts.json
  inflating: build/v0/contracts_KakarotCore.compiled_contract_class.json
  inflating: build/v0/contracts_MockContractUpgradeableV1.contract_class.json
  inflating: build/v0/contracts_ERC20.compiled_contract_class.json
  inflating: build/v0/contracts_ERC20.contract_class.json
  inflating: build/v0/contracts_Cairo1HelpersFixture.compiled_contract_class.json
  inflating: build/v0/contracts_AccountContract.compiled_contract_class.json
  inflating: build/v0/contracts_KakarotCore.contract_class.json
  inflating: build/v0/contracts_MockContractUpgradeableV0.contract_class.json
  inflating: build/v0/contracts_Cairo1Helpers.contract_class.json
  inflating: build/v0/contracts.sierra.json
  inflating: build/v0/contracts_Cairo1Helpers.compiled_contract_class.json
  inflating: build/v0/contracts_Cairo1HelpersFixture.contract_class.json
  inflating: build/v0/contracts_MockContractUpgradeableV1.compiled_contract_class.json
mv build/v0/build/* build/v0
mv: rename build/v0/build/* to build/v0/*: No such file or directory
make: *** [setup-kakarot-v0] Error 1

This seems to cause the precompiles to not be available when I try to test it:

    Finished `test` profile [unoptimized + debuginfo] target(s) in 18.76s
     Running tests/stStaticCall.rs (target/debug/deps/stStaticCall-022d7fbb57a5b4fd)

running 1 test
thread 'stStaticCall::test_static_ABAcalls0_d1g0v0_Cancun' panicked at crates/ef-testing/src/evm_sequencer/constants.rs:66:120:
Failed to load precompiles contract class: No such file or directory (os error 2)

Location:
    crates/ef-testing/src/evm_sequencer/constants.rs:22:32
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
test stStaticCall::test_static_ABAcalls0_d1g0v0_Cancun ... FAILED

failures:

failures:
    stStaticCall::test_static_ABAcalls0_d1g0v0_Cancun

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 477 filtered out; finished in 0.10s

error: test failed, to rerun pass `-p ef-testing --test stStaticCall`

enitrat · 2024-10-25T05:36:53Z

Something broke with our new release yesterday. Will check and update

enitrat · 2024-10-31T04:32:19Z

Hi @azteca1998 @edg-l you should be able to repro again. I updated native to ab47832

I also hid the ram usage prints under a macos target, so that you should be able to run it on other targets fine.

azteca1998 · 2024-11-19T12:57:08Z

Hello, sorry for the delay.

We've been working on various stuff in native which made it so that ef-tests no longer compiles using the main branch of the sequencer and native's main branches. Could you please update it? Some of the recent changes may be involved in the bug.

enitrat · 2024-11-20T17:01:18Z

Hi @azteca1998 i updated the ef-test repo to commit 28caad05e4a13f935acff83ea144df0b1bc24e2e with native 5e60089. The issue still occurs

azteca1998 · 2024-11-20T19:20:12Z

Ok, I think I found the problem. It's the same as #816.

If I run the test with lldb and compare stack pointers I get the following:

Stack pointer (first frame): 0x00007ffff7a1da80
Stack pointer (crash frame): 0x00007ffff780bce0
Crash pointer              : 0x00007ffff780d138

The first number is the value of rsp (stack pointer) when the program started. The second is its value just before the crash. The last number is already outside the allocated stack size, therefore crashing the process.

If I run the test with a larger stack (1GiB in my test, probably works with way less stack) the test seems to run without problems.
You can make the tests run with a larger stack by using this environment variable: RUST_MIN_STACK=1073741824.

enitrat · 2024-11-21T08:24:06Z

Ok, great, at the time I opeoned the issue increasing the stack size did not have any effect, which no longer seems the be the case. However, the current run is taking more than 2hours, up from 50mins before.

https://github.com/kkrt-labs/kakarot-ssj/actions/runs/11947735661/job/33304216235?pr=1011

I will need to check why

azteca1998 · 2024-11-21T11:41:08Z

Could it be that the test that crashed was run at about the 50 minutes mark, therefore stopping all other tests, while now all tests have to be run, taking more than 2 hours?

enitrat · 2024-11-21T11:59:28Z

No, previously all 20k tests were passing in ~50mins (20mins compilation, 30mins runtime). I added some more debug info to pinpoint problem is exactly, will update you when I have results

enitrat · 2024-11-21T14:29:23Z

2024-11-21T13:08:33.3640356Z Test 'modexp_modsize0_returndatasize_d4g0v0_Cancun' completed in 2942.0 seconds
2024-11-21T13:08:33.3641902Z test stReturnDataTest::stReturnDataTest::test_modexp_modsize0_returndatasize_d4g0v0_Cancun ... ok

Some tests take 45 mins to 1.5hrs to complete a single, easy test. Do you have any idea how to debug why this is happening? I'm not sure what to look for.

test_modexp_modsize0_returndatasize_d4g0v0_Cancun

That specific instance has a lot of felt252 dict operations because it's very unoptimized. I'll add some prints directly in the cairo code to see what happens in realtime, but that's a first thing we could be looking at.

azteca1998 · 2024-11-21T14:40:04Z

Not sure if that's the case, especially since it didn't happen earlier, but dictionaries are implemented in Rust within the runtime. When running the JIT, the runtime used is the dependency of cairo-native, which for debug builds is built using the debug profile. Could you try running the test in release mode?

When running in AOT, the runtime used is the static library. If the tests are running in AOT, could you check if the linked library (the path in the env var) points to a release build?

enitrat · 2024-11-21T14:51:56Z

JIT is new right? I think i'm running in AOT because the integration in Native uses AotExecutor.
What do you think I should be using for this specific usecase ?

I built the runtime running make runtime which builds in release mode, so it's a release build I believe.

azteca1998 · 2024-11-21T14:56:18Z

JIT is the original executor, AOT came later, but I think you're probably using AOT.
If the runtime uses the release build then I'm not sure what could be causing the performance issues.

What do you think I should be using for this specific usecase ?

Both are fine. Internally, their interface is the same, so whatever works on one should work on the other. The sequencer is using AOT.

enitrat · 2024-11-21T16:26:16Z

...
Start: Executing Modexp Precompile
Current test duration: 803.0s
...
WARNING: Test 'modexp_modsize0_returndatasize_d4g0v0_Cancun' has been running for over 120 seconds
Current duration: 1516.0s
End: MPNAT::from_big_endian operation
Start: modpow operation
End: modpow operation
End: Executing Modexp Precompile

There seems to be an issue with the operations performed in MPNAT::from_big_endian, executed twice, taking ~ 10 mins so 5mins each.

Here's a link to the function https://github.com/kkrt-labs/kakarot-ssj/blob/c2e4ccbfcd8d6b8d79668f34daab97080d6769f9/crates/utils/src/crypto/modexp/mpnat.cairo#L38-L106

It involves a lot of dict operations. Do you want me to add more granularity?

azteca1998 · 2024-11-21T16:29:17Z

Can you provide the bytes argument? This way we'll be able to replicate it exactly.

enitrat · 2024-11-22T05:02:19Z

Can you provide the bytes argument? This way we'll be able to replicate it exactly.

Hm so actually in that specific case I checked and we're sending 999188 bytes to the function. It's quite a lot, I can't really send it over to you.

edit: I opened old logs of using native to find out that this test actually used to pass when using native ae17dd3 🤔 so perhaps the big number of bytes is not that much of an issue and it should pass regardless?

see https://github.com/kkrt-labs/kakarot-ssj/actions/runs/11810716802/job/32903209209 from PR kkrt-labs/kakarot-ssj#1021 (where the version of native is ae17dd3): download the logs, open 2_ef-tests _ ef-tests.txt, search for modexp_modsize0_returndatasize_d4g0v0_Cancun and you will see it passed

enitrat · 2024-11-22T08:18:57Z

Just re-ran on my computer the stack using cairo-native ae17dd3

WARNING: Test 'modexp_modsize0_returndatasize_d4g0v0_Cancun' has been running for over 120 seconds
Current duration: 834.0s
... some additional operations ..

 INFO ef_testing::models::result: stReturnDataTest::modexp_modsize0_returndatasize_d4g0v0_Cancun passed: TransactionResources { starknet_resources: StarknetResources { calldata_length: 155, state_changes_for_fee: StateChangesCount { n_storage_updates: 3, n_class_hash_updates: 0, n_compiled_class_hash_updates: 0, n_modified_contracts: 3 }, message_cost_info: MessageL1CostInfo { l2_to_l1_payload_lengths: [], message_segment_length: 0 }, l1_handler_payload_size: None, n_events: 1, signature_length: 2, code_size: 0, total_event_keys: 1, total_event_data_size: 3 }, vm_resources: ExecutionResources { n_steps: 7527, n_memory_holes: 53, builtin_instance_counter: {ecdsa: 1, range_check: 117, pedersen: 173} }, n_reverted_steps: 0 }
Current memory usage: Ok(4963549184) bytes

Test 'modexp_modsize0_returndatasize_d4g0v0_Cancun' completed in 865.0 seconds
test stReturnDataTest::test_modexp_modsize0_returndatasize_d4g0v0_Cancun ... ok

So that's like ~ 10 seconds for the actual test run. Compared to infinite time now 🤔

To summarize:

https://github.com/kkrt-labs/ef-tests on branch feat/cairo-native, commit hash f752975db32dead8175ac5392d132d3524e9fc0b, using cairo-native ae17dd3
less than 10sec to run this big test
https://github.com/kkrt-labs/ef-tests on branch dev/bump-native, commit hash a513ac774a8e6ad97132f406938eec7ee71423ce , using cairo-native 5e60089:

The test takes around 10 mins to run, and other tests also have extremely degraded performance.

Let me know how I could help you more.

Regarding the bytes passed to the function:

bytes.len(): bytes.len(): 999188
bytes slice 128: [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
bytes slice last 128: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

I think it's an array of size 999188 whose first element is 1 and the rest is 0.

azteca1998 · 2024-11-22T10:43:19Z

It seems that between the previous Cairo native version and the current one we've added proper memory management (now we deallocate stuff properly). That may be part of the performance issue. Here are the commits I suspect will have more impact in the program's performance:

79a8dcf - Fix clone and drop implementations for enums, structs and snapshots.
0b73f1c - Fix `array_get`'s drop implementation.
cff7a11 - Update to 2.8.4, add release docs, version alpha.3
82c25b3 - Add malloc tracing and fix more memory leaks.

If you're going to test them, please keep in mind it may start crashing again since there have been some bug fixes after those commits. I recommend testing using a test that was known to pass but still has had a performance regression.

enitrat · 2024-11-22T14:57:31Z

Would there be another way to check what's taking so long? testing each one of these commits manually sounds tedious because I have to re-adapt the sequencer impl every time

azteca1998 · 2024-11-22T16:00:17Z

Yes, we're working (since about an hour ago) in a tool to profile programs a bit better. I'm not sure if it'll be useful at first because we're more interested in the time taken by cairo vs the syscall handler, but it should be adaptable to this use case later on.

enitrat · 2024-12-06T14:20:15Z

Hello,

I updated our branches and runner to use the starkware-libs/sequencer integration, and the release v0.2.4 of Cairo Native; this has not solved the issue unfortunately. I reverted our CI back to using the CairoVM for execution.

Due to a priority shift I will no longer be trying to debug this, however, the pending work has been cleared so that we can re-start these investigations in a future date by simply running our test with the 'native' feature.

enitrat mentioned this issue Oct 20, 2024

bug: regressions introduced by latest main #831

Closed

edg-l added the bug Something isn't working label Oct 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: invalid memory reference #870

bug: invalid memory reference #870

enitrat commented Oct 20, 2024

edg-l commented Oct 21, 2024

enitrat commented Oct 21, 2024 •

edited

Loading

azteca1998 commented Oct 24, 2024

enitrat commented Oct 25, 2024

enitrat commented Oct 31, 2024 •

edited

Loading

azteca1998 commented Nov 19, 2024

enitrat commented Nov 20, 2024

azteca1998 commented Nov 20, 2024

enitrat commented Nov 21, 2024

azteca1998 commented Nov 21, 2024

enitrat commented Nov 21, 2024

enitrat commented Nov 21, 2024

azteca1998 commented Nov 21, 2024 •

edited

Loading

enitrat commented Nov 21, 2024

azteca1998 commented Nov 21, 2024 •

edited

Loading

enitrat commented Nov 21, 2024 •

edited

Loading

azteca1998 commented Nov 21, 2024

enitrat commented Nov 22, 2024 •

edited

Loading

enitrat commented Nov 22, 2024 •

edited

Loading

azteca1998 commented Nov 22, 2024 •

edited

Loading

enitrat commented Nov 22, 2024

azteca1998 commented Nov 22, 2024

enitrat commented Dec 6, 2024

bug: invalid memory reference #870

bug: invalid memory reference #870

Comments

enitrat commented Oct 20, 2024

edg-l commented Oct 21, 2024

enitrat commented Oct 21, 2024 • edited Loading

azteca1998 commented Oct 24, 2024

enitrat commented Oct 25, 2024

enitrat commented Oct 31, 2024 • edited Loading

azteca1998 commented Nov 19, 2024

enitrat commented Nov 20, 2024

azteca1998 commented Nov 20, 2024

enitrat commented Nov 21, 2024

azteca1998 commented Nov 21, 2024

enitrat commented Nov 21, 2024

enitrat commented Nov 21, 2024

azteca1998 commented Nov 21, 2024 • edited Loading

enitrat commented Nov 21, 2024

azteca1998 commented Nov 21, 2024 • edited Loading

enitrat commented Nov 21, 2024 • edited Loading

azteca1998 commented Nov 21, 2024

enitrat commented Nov 22, 2024 • edited Loading

enitrat commented Nov 22, 2024 • edited Loading

azteca1998 commented Nov 22, 2024 • edited Loading

enitrat commented Nov 22, 2024

azteca1998 commented Nov 22, 2024

enitrat commented Dec 6, 2024

enitrat commented Oct 21, 2024 •

edited

Loading

enitrat commented Oct 31, 2024 •

edited

Loading

azteca1998 commented Nov 21, 2024 •

edited

Loading

azteca1998 commented Nov 21, 2024 •

edited

Loading

enitrat commented Nov 21, 2024 •

edited

Loading

enitrat commented Nov 22, 2024 •

edited

Loading

enitrat commented Nov 22, 2024 •

edited

Loading

azteca1998 commented Nov 22, 2024 •

edited

Loading