Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: ccall sanitizer #12173

Open
yuyichao opened this issue Jul 16, 2015 · 0 comments
Open

RFC: ccall sanitizer #12173

yuyichao opened this issue Jul 16, 2015 · 0 comments
Labels
ffi foreign function interfaces, ccall, etc.

Comments

@yuyichao
Copy link
Contributor

TL;DR

This is a runtime debug option proposed by @carnaval to catch common misuse of ccall which could otherwise cause corruption that's hard to debug.

The problem

...... when someone trashed a tag. Often happens in pools with OOB stores, ...... Unfortunately it's
the only "downside" of having an almost too easy C FFI.
        -- Oscar Blumberg (@carnaval) at #11945 (comment) #11945 (comment)

The issue here is that it is very/too easy to pass a pointer to an object to a C function with ccall. This is one of the awesomeness of Julia. However, if the user makes a mistake about the C api, commonly not allocating enough memory, the c function might write to memory that is not meant to be written which could corrupt the GC managed memory and crashes with spectacular backtrace (pages of push_root).

This happens surprisingly often. It can be caused by making mistake on the c-interface or ccall (#11945), upstream abi breakage (JuliaInterop/ZMQ.jl#83) or just not careful enough on refactoring (JuliaGPU/OpenCL.jl#65) (I'm only aware of these but there's probably more...).

This kind of issue is also relatively hard to debug as they happens randomly and usually crashes in totally unrelated places. With @carnaval 's guide on GC debugging, it is not impossibly to figure out the issue but is nontheless much harder than using ccall (unless, of course, if someone can somehow integrate lldb (or sth else) in julia).

Solution

It's very hard (if not impossible) to detect this issue ahead of time but it would really help debugging and fixing the issue if the error is raised as early as possible (at the ccall site). One way to do that, proposed by @carnaval , is to allocate some extra memory with known random content for each object and check after each ccall to make sure the c function didn't modify them.

Implementation

AFAICT, one (only?) missing part for implementing this is to figure out on which objects/memory regions this check should be performed. There are several possibilities that I can think of:

  1. Do it statically in codegen.

    This is hard because ccall only see the pointer after unsafe_convert (or the user may even convert using pointer directly).

  2. Scan every objects.

    This should be doable and should be relatively easy to implement (just add an extra call into the GC after each ccall site). The obvious problem would be performance. This can probably be implemented as a more aggressive mode to catch corner cases but should probably not be the default one.

  3. Let the GC figure out if the pointer should be sanitized.

    I think this is the best compromise between performance and implementation difficulty. The idea is to use the machinery for conservative stack scan to figure out if a pointer is pointing to a GC object. The codegen will just pass all pointer arguments to the GC and do all the work there. An additional benefit of this is that it can be used to verify that the object is properly rooted.

Other concern

  • Stack allocated memory

    Currently only limited to the & ccall operator and can be special cased. If we have more general stack allocation of mutable object, we can probably keep track of them in this mode the same way we keep the gc frame.

  • Concurrency

    In order to support multiple thread using the same object, the random data in the padding should probably be assigned at allocation time and never changes afterword (i.e. cannot be filled just before ccall).

  • Mixing generated code with and without option (e.g. running in this mode with sysimg generated without such option)

    The GC should keep track of whether each object has padding allocated (heap allocated ones should be consistent, stack allocation might need more detail tracking) and do the check accordingly.

@brenhinkeller brenhinkeller added the ffi foreign function interfaces, ccall, etc. label Nov 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ffi foreign function interfaces, ccall, etc.
Projects
None yet
Development

No branches or pull requests

2 participants