Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

curPeEvent reference causes SEGFAULT #3832

Open
trquinn opened this issue Jul 24, 2024 · 7 comments
Open

curPeEvent reference causes SEGFAULT #3832

trquinn opened this issue Jul 24, 2024 · 7 comments
Labels

Comments

@trquinn
Copy link
Collaborator

trquinn commented Jul 24, 2024

If charm is built with --with-production and an application with a [local] or [inline] entry method is compiled with -O0, the _TRACE_BEGIN_EXECUTE_DETAILED(CpvAccess(curPeEvent), ... call in the CProxyElement call does not get optimized out, but curPeEvent is not initialized, resulting in a NULL pointer access.

@stwhite91
Copy link
Collaborator

Is this Charm built with --enable-tracing and the application linked with -tracemode projections?

@trquinn
Copy link
Collaborator Author

trquinn commented Jul 24, 2024

No. I suspect that adding --enable-tracing to the --with-production build would work around the problem.

@stwhite91
Copy link
Collaborator

I don't think it should be possible to get that line without having built Charm with --enable-tracing. From src/ck-perf/trace.h:

#if CMK_TRACE_ENABLED
#  define _TRACE_ONLY(code) do{if(CpvAccess(traceOn) && CkpvAccess(_traces)->length()>0) { code; }} while(0)
#else
#  define _TRACE_ONLY(code) /*empty*/
#endif

inline void _TRACE_BEGIN_EXECUTE_DETAILED(int event, int msgType, int ep, int srcPe,
                                          int mLen, CmiObjId* idx, void* obj)
{
  _TRACE_ONLY(
      CkpvAccess(_traces)->beginExecute(event, msgType, ep, srcPe, mLen, idx, obj));
}

I am not able to replicate the issue on a netlrts-linux-x86_64 --with-production build either with --enable-tracing or without it, so your full Charm build command would be helpful. I suspect this will be compiler specific as well.

@trquinn
Copy link
Collaborator Author

trquinn commented Jul 24, 2024

The SEGV occurs in the function in the *.def.h file which calls _TRACE_BEGIN_EXECUTE_DETAILED(). If the function is not optimized out and the inline is ignored, the compiler has to get the event parameter to pass to this function, which causes the SEGV. I expect it to be very compiler dependent (I am using gcc v11.2.0, see N-BodyShop/changa#178 ), since it depends on how a compiler deals with an empty inline function.

@stwhite91
Copy link
Collaborator

stwhite91 commented Jul 25, 2024

I see now. Our code generation doesn't check if tracing is enabled and curPeEvent and other tracing state is only initialized if tracing is enabled.

Two solutions I can see would be either 1) we change the generated code to check if tracing is enabled, or 2) we move the initialization of curPeEvent (and maybe other variables like it) to happen no matter if tracing is enabled or not. The less invasive approach would be 1. @rbuch may have better ideas on the tracing infrastructure?

Also a workaround for this issue, if it is blocking you, would be to declare the entry method as [local, notrace] in the .ci file.

@lvkale
Copy link
Contributor

lvkale commented Sep 18, 2024

@trquinn did the workaround work for you?
Secondly, is there anything we need to fix? In other words, should this work without enable-tracing

@trquinn
Copy link
Collaborator Author

trquinn commented Sep 18, 2024

If I recall, the workaround did work for us.
I suggest that code generator be fixed to check if tracing is enabled, and only put out the _TRACE_BEGIN_EXECUTE_DETAILED(CpvAccess(curPeEvent), ... statement if tracing is enabled. Or perhaps somehow protect that statement with a _TRACE_ONLY macro.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants