Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault on subsequent conversions #206

Open
jonatanklosko opened this issue Dec 23, 2024 · 15 comments
Open

Segmentation fault on subsequent conversions #206

jonatanklosko opened this issue Dec 23, 2024 · 15 comments

Comments

@jonatanklosko
Copy link

Hello, thank you for this fantastic package!

The following modified example from vl-convert-rs results in a segfault:

use vl_convert_rs::converter::VlOpts;
use vl_convert_rs::{VlConverter, VlVersion};

#[tokio::main]
async fn main() {
    let vl_spec: serde_json::Value = serde_json::from_str(
        r#"
{
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "data": {"url": "data/movies.json"},
  "mark": "circle",
  "encoding": {
    "x": {
      "bin": {"maxbins": 10},
      "field": "IMDB Rating"
    },
    "y": {
      "bin": {"maxbins": 10},
      "field": "Rotten Tomatoes Rating"
    },
    "size": {"aggregate": "count"}
  }
}   "#,
    )
    .unwrap();


    println!("{}", convert(vl_spec.clone()).await);
    println!("{}", convert(vl_spec.clone()).await)
}

async fn convert(vl_spec: serde_json::Value) -> String {
  let mut converter = VlConverter::new();

  converter
      .vegalite_to_svg(
          vl_spec,
          VlOpts {
              vl_version: VlVersion::v5_8,
              ..Default::default()
          },
      )
      .await
      .expect("Failed to perform Vega-Lite to Vega conversion")
}

Note a separate converter created for each conversion. If we change it to reuse the same converter it no longer segfaults.

I can reproduce the segfault on x86_64 Ubuntu Linux, running the example directly against vl-convert-rs main.

@jonmmease
Copy link
Collaborator

Hi @jonatanklosko, thanks for the report.

Could you give this a try with a "current thread" tokio runtime?

#[tokio::main(flavor = "current_thread")]

Deno isn't fully compatible with the multi-threaded tokio runtime. While not a segfault, here's a prior issue I ran into that was resolved by switching to the single threaded runtime: denoland/deno#19670 (comment)

@jonatanklosko
Copy link
Author

@jonmmease unfortunately the segfault still happens :(

@jonatanklosko
Copy link
Author

If at all useful, here's the stacktrace obtained via lldb:

Stacktrace
* thread #7, name = 'conversion2', stop reason = unknown crash reason
  * frame #0: 0x00005570f5a01440
    frame #1: 0x000055717583935e conversion2`Builtins_InterpreterEntryTrampoline + 222
    frame #2: 0x00005571758f8d9e conversion2`Builtins_ArrayForEach + 926
    frame #3: 0x000055717583935e conversion2`Builtins_InterpreterEntryTrampoline + 222
    frame #4: 0x000055717583935e conversion2`Builtins_InterpreterEntryTrampoline + 222
    frame #5: 0x0000557175836edc conversion2`Builtins_JSEntryTrampoline + 92
    frame #6: 0x0000557175836c1b conversion2`Builtins_JSEntry + 155
    frame #7: 0x00005571751418d4 conversion2`v8::internal::(anonymous namespace)::Invoke(v8::internal::Isolate*, v8::internal::(anonymous namespace)::InvokeParams const&) at execution.cc:420:22
    frame #8: 0x0000557175141f5e conversion2`v8::internal::Execution::CallScript(v8::internal::Isolate*, v8::internal::Handle<v8::internal::JSFunction>, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>) at execution.cc:517:10
    frame #9: 0x00005571750a1f44 conversion2`v8::Script::Run(v8::Local<v8::Context>, v8::Local<v8::Data>) at api.cc:2143:7
    frame #10: 0x0000557174a5b9e7 conversion2`v8::script::_$LT$impl$u20$v8..data..Script$GT$::run::_$u7b$$u7b$closure$u7d$$u7d$::h2a8df6e375005345((null)={closure_env#0} @ 0x00007fa60868db88, sd=0x00007fa5ec300430) at script.rs:99:29
    frame #11: 0x0000557174c05580 conversion2`v8::script::_$LT$impl$u20$v8..data..Script$GT$::run::ha294ac49edf3f9e9 [inlined] v8::scope::HandleScope$LT$$LP$$RP$$GT$::cast_local::ha62ff887adee9464(self=0x00007fa60868ff40, f={closure_env#0} @ 0x00007fa60868dc90) at scope.rs:253:21
    frame #12: 0x0000557174c05234 conversion2`v8::script::_$LT$impl$u20$v8..data..Script$GT$::run::ha294ac49edf3f9e9(self=0x00007fa5ec7dc420, scope=0x00007fa60868ff40) at script.rs:99
    frame #13: 0x0000557174a9cc1e conversion2`deno_core::runtime::bindings::initialize_primordials_and_infra::hffe0cdfcfa9b18da(scope=0x00007fa60868ff40) at bindings.rs:319:5
    frame #14: 0x00005571749a3f32 conversion2`deno_core::runtime::jsruntime::JsRuntime::new_inner::hc1b5c5a1e2e61d5a(options=RuntimeOptions @ 0x00007fa6086910d0, will_snapshot=false) at jsruntime.rs:1002:7
    frame #15: 0x00005571749a02bf conversion2`deno_core::runtime::jsruntime::JsRuntime::try_new::hecb9dbbc76ac4a2c(options=RuntimeOptions @ 0x00007fa608694cb0) at jsruntime.rs:738:5
    frame #16: 0x00005571749a004c conversion2`deno_core::runtime::jsruntime::JsRuntime::new::h7caf829e38686389(options=<unavailable>) at jsruntime.rs:721:11
    frame #17: 0x000055716f84af00 conversion2`deno_runtime::worker::MainWorker::from_options::ha37d25136b4b9f12(main_module=Url @ 0x00007fa608698090, permissions=PermissionsContainer @ 0x00007fa608692538, options=WorkerOptions @ 0x00007fa6086980e8) at worker.rs:487:26
    frame #18: 0x000055716f84852e conversion2`deno_runtime::worker::MainWorker::bootstrap_from_options::hc11b349e74c8a31a(main_module=Url @ 0x00007fa608699660, permissions=PermissionsContainer @ 0x00007fa608697e08, options=WorkerOptions @ 0x00007fa6086996b8) at worker.rs:313:22
    frame #19: 0x000055716ec7f62f conversion2`vl_convert_rs::converter::InnerVlConverter::try_new::_$u7b$$u7b$closure$u7d$$u7d$::he49ddb4d43884203((null)=0x00007fa60869db70) at converter.rs:589:13
    frame #20: 0x000055716ec8ba81 conversion2`vl_convert_rs::converter::VlConverter::new::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::h0e25bbd3090b7868((null)=0x00007fa60869db70) at converter.rs:1027:61
    frame #21: 0x000055716ecb4320 conversion2`tokio::runtime::scheduler::current_thread::CurrentThread::block_on::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::hd5f5af96aa029794(cx=0x00007fa60869db70) at mod.rs:186:49
    frame #22: 0x000055716ecb37b1 conversion2`_$LT$tokio..future..poll_fn..PollFn$LT$F$GT$$u20$as$u20$core..future..future..Future$GT$::poll::hb47777a9101694ab(self=Pin<&mut tokio::future::poll_fn::PollFn<tokio::runtime::scheduler::current_thread::{impl#0}::block_on::{closure#0}::{closure_env#0}<vl_convert_rs::converter::{impl#7}::new::{closure#0}::{async_block_env#0}>>> @ 0x00007fa60869da68, cx=0x00007fa60869db70) at poll_fn.rs:58:9
    frame #23: 0x000055716ecbebd7 conversion2`tokio::runtime::park::CachedParkThread::block_on::_$u7b$$u7b$closure$u7d$$u7d$::hfe7daf12fe9a5c02 at park.rs:281:63
    frame #24: 0x000055716ecbe4ae conversion2`tokio::runtime::park::CachedParkThread::block_on::h79c0d261a58330cc at coop.rs:107:5
    frame #25: 0x000055716ecbe423 conversion2`tokio::runtime::park::CachedParkThread::block_on::h79c0d261a58330cc [inlined] tokio::runtime::coop::budget::ha88d84af0f46c8fd(f={closure_env#0}<tokio::future::poll_fn::PollFn<tokio::runtime::scheduler::current_thread::{impl#0}::block_on::{closure#0}::{closure_env#0}<vl_convert_rs::converter::{impl#7}::new::{closure#0}::{async_block_env#0}>>> @ 0x00007fa60869dc20) at coop.rs:73
    frame #26: 0x000055716ecbe384 conversion2`tokio::runtime::park::CachedParkThread::block_on::h79c0d261a58330cc(self=0x00007fa60869dc96, f=PollFn<tokio::runtime::scheduler::current_thread::{impl#0}::block_on::{closure#0}::{closure_env#0}<vl_convert_rs::converter::{impl#7}::new::{closure#0}::{async_block_env#0}>> @ 0x00007fa60869dbc8) at park.rs:281
    frame #27: 0x000055716ecbc958 conversion2`tokio::runtime::context::blocking::BlockingRegionGuard::block_on::h77af11f419231fe4(self=0x00007fa60869def0, f=PollFn<tokio::runtime::scheduler::current_thread::{impl#0}::block_on::{closure#0}::{closure_env#0}<vl_convert_rs::converter::{impl#7}::new::{closure#0}::{async_block_env#0}>> @ 0x00007fa60869dca0) at blocking.rs:66:9
    frame #28: 0x000055716ecb4100 conversion2`tokio::runtime::scheduler::current_thread::CurrentThread::block_on::_$u7b$$u7b$closure$u7d$$u7d$::h500fa7a704f42916(blocking=0x00007fa60869def0) at mod.rs:180:40
    frame #29: 0x000055716ecb7f5e conversion2`tokio::runtime::context::runtime::enter_runtime::hf1b95be7488a0148(handle=0x00005571794f3098, allow_block_in_place=false, f={closure_env#0}<vl_convert_rs::converter::{impl#7}::new::{closure#0}::{async_block_env#0}> @ 0x00007fa60869e770) at runtime.rs:65:16
    frame #30: 0x000055716ecb3a91 conversion2`tokio::runtime::scheduler::current_thread::CurrentThread::block_on::hf07e103c654ad87b(self=0x00005571794f3070, handle=0x00005571794f3098, future=<unavailable>) at mod.rs:167:9
    frame #31: 0x000055716ec95dc8 conversion2`tokio::runtime::runtime::Runtime::block_on::h2ed773bfebfabae4(self=0x00005571794f3068, future={async_block_env#0} @ 0x00007fa60869f880) at runtime.rs:348:47
    frame #32: 0x000055716ec8b6eb conversion2`vl_convert_rs::converter::VlConverter::new::_$u7b$$u7b$closure$u7d$$u7d$::h9ea1b1e7d28fd52f at converter.rs:1026:13
    frame #33: 0x000055716ecae8bd conversion2`std::sys::backtrace::__rust_begin_short_backtrace::hcd778ea3610c22e8(f={closure_env#0} @ 0x00007fa6086a0090) at backtrace.rs:154:18
    frame #34: 0x000055716eca721b conversion2`std::thread::Builder::spawn_unchecked_::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::h634919b9f26751be at mod.rs:538:17
    frame #35: 0x000055716ecbd65f conversion2`_$LT$core..panic..unwind_safe..AssertUnwindSafe$LT$F$GT$$u20$as$u20$core..ops..function..FnOnce$LT$$LP$$RP$$GT$$GT$::call_once::h6665d0ca04b7d1d5(self=AssertUnwindSafe<std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure_env#0}<vl_convert_rs::converter::{impl#7}::new::{closure_env#0}, core::result::Result<(), anyhow::Error>>> @ 0x00007fa6086a00c8, (null)=<unavailable>) at unwind_safe.rs:272:9
    frame #36: 0x000055716ecbbc3a conversion2`std::panicking::try::do_call::h6370f39536327d9d(data="�,{qU") at panicking.rs:557:40
    frame #37: 0x000055716eca723b conversion2`__rust_try + 27
    frame #38: 0x000055716eca6e44 conversion2`std::thread::Builder::spawn_unchecked_::_$u7b$$u7b$closure$u7d$$u7d$::ha91f5cdbe853a4d4 at panicking.rs:520:19
    frame #39: 0x000055716eca6e11 conversion2`std::thread::Builder::spawn_unchecked_::_$u7b$$u7b$closure$u7d$$u7d$::ha91f5cdbe853a4d4 [inlined] std::panic::catch_unwind::h48d45f508bfc1507(f=AssertUnwindSafe<std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure_env#0}<vl_convert_rs::converter::{impl#7}::new::{closure_env#0}, core::result::Result<(), anyhow::Error>>> @ 0x00007fa6086a0330) at panic.rs:358
    frame #40: 0x000055716eca6e11 conversion2`std::thread::Builder::spawn_unchecked_::_$u7b$$u7b$closure$u7d$$u7d$::ha91f5cdbe853a4d4 at mod.rs:537
    frame #41: 0x000055716ec4b3ff conversion2`core::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::hd8e5a4833b9a7aa9((null)=0x000055717b052cb0, (null)=<unavailable>) at function.rs:250:5
    frame #42: 0x000055717670437b conversion2`std::sys::pal::unix::thread::Thread::new::thread_start::hcc78f3943333fa94 [inlined] _$LT$alloc..boxed..Box$LT$F$C$A$GT$$u20$as$u20$core..ops..function..FnOnce$LT$Args$GT$$GT$::call_once::hf75717d9f28faebf at boxed.rs:2454:9
    frame #43: 0x0000557176704373 conversion2`std::sys::pal::unix::thread::Thread::new::thread_start::hcc78f3943333fa94 [inlined] _$LT$alloc..boxed..Box$LT$F$C$A$GT$$u20$as$u20$core..ops..function..FnOnce$LT$Args$GT$$GT$::call_once::h7bd883a5f3c5f3c1 at boxed.rs:2454
    frame #44: 0x000055717670436c conversion2`std::sys::pal::unix::thread::Thread::new::thread_start::hcc78f3943333fa94 at thread.rs:105
    frame #45: 0x00007fa60a6b8609 libpthread.so.0`start_thread + 217
    frame #46: 0x00007fa60a48c353 libc.so.6`__clone + 67

@jonmmease
Copy link
Collaborator

Thanks for the stack trace.

I don't see the crash when running this example locally on macos, so I tried running it in a GitHub action under Ubuntu 22.04, and I still don't see the crash. See #207 and https://github.com/vega/vl-convert/actions/runs/12483749003/job/34840040408?pr=207.

A helpful next step would be to get a repro of this in a GitHub actions, so let me know if you have any thoughts on what we could try here.

@jonatanklosko
Copy link
Author

It looks like I cannot reproduce it on GitHub Actions, I used a minimal setup:

jobs:
  vl-convert-rs-example2:
    runs-on: ubuntu-20.04
    container:
      image: ubuntu:22.04
    steps:
      - uses: actions/checkout@v2
      - run: |
          apt-get update
          apt-get install -y git build-essential wget curl
          curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
          . "$HOME/.cargo/env"
          cd vl-convert-rs
          cargo run --example conversion_sequence

I can reproduce the segfault running exactly that code on AWS x86_64 machine with Ubuntu 20.04, and I also run inside ubuntu:22.04 Docker container in both cases, just to make the setup is as close as possible. I don't know what else can be at play.

@charlesfsl
Copy link

charlesfsl commented Jan 12, 2025

@jonatanklosko and @jonmmease, thank you for attempting to identify what's going on here and for the great packages you maintain. Since I've experienced this on CircleCI I tried to reproduce it in GitHub Actions, but I could not. I also tried with the image I first encountered this on: hexpm/elixir:1.17.3-erlang-27.0.1-debian-bookworm-20241202.

However, this example from @jonatanklosko also produces a segmentation fault on CircleCI with either the hexpm image or vanilla ubuntu:22.04.

Here's the above GH ci config adapted for CircleCI that I used:

version: 2.1

parameters:
  elixir-image:
    type: string
    default: hexpm/elixir:1.17.3-erlang-27.0.1-debian-bookworm-20241202
  ubuntu-image:
    type: string
    default: ubuntu:22.04

executors:
  test-container:
    docker:
      - image: << pipeline.parameters.ubuntu-image >>

commands:
  code-setup:
    description: "Ensures code is checked out and basic tooling is ready"
    steps:
      - checkout
      - run: apt-get update
      - run: apt-get install -y git build-essential wget curl

jobs:
  vl-convert-rs-example2:
    executor: test-container
    steps:
      - code-setup
      - run: |
          curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
          . "$HOME/.cargo/env"
          cd vl-convert-rs
          cargo run --example conversion_sequence

workflows:
  test-suite:
    jobs:
      - vl-convert-rs-example2

I was also able to reproduce this locally on a laptop running NixOS on a x86_64 chip with the nix branch of this fork. The segfault also still happened for me with a "current thread" tokio runtime. Attached is the stacktrace from lldb.

That fork also has the CircleCI config referenced above and the conversion_sequence.rs file I tested with.

Let me know if anything else would be helpful!

lldb-stacktrace.txt

@jonmmease
Copy link
Collaborator

Thanks for investigating @charlesfsl, having a repro on circleci is very helpful. Though it is a bit troubling that that same Docker image yields different behavior in GitHub Actions and circleci.

The first things I was planning to try, after having a repro, is to update to Deno 2. There is a PR that I haven't reviewed yet that makes this update over in #205. If you're interested in trying that branch with your repro that would be helpful.

@charlesfsl
Copy link

No problem, @jonmmease . I'll give that a go, probably tonight, and let you know.

@charlesfsl
Copy link

@jonmmease I'm seeing the same result after merging in the upgrade-deno branch.

Image

Looking at #207, I see you weren't able to reproduce this on MacOS (M3). I've not tried this vl-convert-only example on MacOS, but through exploring the originating issue I did encounter this issue on MacOS. Let me know if it would be helpful to try these vl-convert branches/forks on MacOS.

@jonmmease
Copy link
Collaborator

Let me know if it would be helpful to try these vl-convert branches/forks on MacOS.

Yeah, if this crash was repro-able on MacOS arm that would be much easier for me to debug. Thanks for your help!

@charlesfsl
Copy link

In attempting that, I remembered that converting to SVG did work fine on Mac, but converting to png or jpg did not. I tried to adapt conversion_sequence.rs to call vegalite_to_png but my complete lack of any Rust knowledge interfered with writing working code. I'll need some guidance on that, or someone with Rust experience will need to do that.

Thanks

@jonatanklosko
Copy link
Author

@charlesfsl you can try this:

Details
use vl_convert_rs::converter::VlOpts;
use vl_convert_rs::{VlConverter, VlVersion};

#[tokio::main]
async fn main() {
    let vl_spec: serde_json::Value = serde_json::from_str(
        r#"
{
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "data": {"url": "data/movies.json"},
  "mark": "circle",
  "encoding": {
    "x": {
      "bin": {"maxbins": 10},
      "field": "IMDB Rating"
    },
    "y": {
      "bin": {"maxbins": 10},
      "field": "Rotten Tomatoes Rating"
    },
    "size": {"aggregate": "count"}
  }
}   "#,
    )
    .unwrap();

    println!("{}", convert(vl_spec.clone()).await.len());
    println!("{}", convert(vl_spec.clone()).await.len())
}

async fn convert(vl_spec: serde_json::Value) -> Vec<u8> {
    let mut converter = VlConverter::new();

    converter
        .vegalite_to_png(
            vl_spec,
            VlOpts {
                vl_version: VlVersion::v5_8,
                ..Default::default()
            },
            None,
            None
        )
        .await
        .expect("Failed to perform Vega-Lite to Vega conversion")
}

I couldn't reproduce on mac, but maybe you will have more luck :)

@charlesfsl
Copy link

I couldn't reproduce on mac, but maybe you will have more luck :)

Thanks @jonatanklosko! Unfortunately I was unable to reproduce the segfault on mac with this test, called with cargo run --example conversion_sequence_png.

@jonmmease
Copy link
Collaborator

png export works by converting to SVG first, and then converting from svg to PNG with the pure rust resvg library. So it sounds like the original error inside deno/v8 isn't happening on MacOS. Thanks for giving it a try! I'll try to repro on circleci soon.

@jonmmease
Copy link
Collaborator

I've reproduced the seg fault on circle ci, and I've tried a bunch of things but haven't made any progress on mitigating the issue.

I have noticed that this only happens when the Deno worker is created within a thread, which we do because the worker is not Send.

As an immediate workaround you can store a single VlConverter instance in a lazy_static block, which is what the Python API does here. VlConverter is Send and it's cheap to clone.

My only current idea for a fix is to do this internally, which would mean that the Deno worker would never be dropped after it is instantiated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants