Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESP301 IOC crashes when exit IOC shell #13

Open
LeeYangLBLBCS opened this issue Jul 1, 2022 · 14 comments
Open

ESP301 IOC crashes when exit IOC shell #13

LeeYangLBLBCS opened this issue Jul 1, 2022 · 14 comments

Comments

@LeeYangLBLBCS
Copy link

I get this error every time I exit IOC shell when running the newport driver with ESP301 controller.
I'm using serial port (NOT the USB port), at 19200 baud.
======================IOC shell log================
2022/07/01 12:51:09.468 /dev/ttyUSB1 read 3
0\r\n
2022/07/01 12:51:09.468 /dev/ttyUSB1 read 3, return 0
2022/07/01 12:51:09.468 serial5 queueUnlockPort
2022/07/01 12:51:09.468 serial5 asynManager::queueUnlockPort waiting for event
2022/07/01 12:51:09.468 serial5 queueUnlockPort unlock mutex 0x7fdfc8000c30 complete.

epics>
epics> exit
epicsEventDestroy: pthread_mutex_destroy failed: Device or resource busy

@kmpeters
Copy link
Contributor

A similar error was reported on tech-talk back in 2011:

https://epics.anl.gov/tech-talk/2011/msg00113.php

At the time it was suspected to be due to this bug in base, which was fixed in the 3.14 branch, also in 2011:

https://bugs.launchpad.net/epics-base/+bug/693534

Which version of base was being used for this IOC?

@LeeYangLBLBCS
Copy link
Author

LeeYangLBLBCS commented May 10, 2023 via email

@kmpeters
Copy link
Contributor

You should be able to set the atExitDebug variable before exiting the IOC:

iockmp6> var atExitDebug 10

Which should give you some info about what is happening when exit is called:

iockmp6> exit
atExit save_restoreShutdown((nil))
atExit &pva_server_cleanup((nil))
atExit exitDatabase((nil))
atExit exit_handler((nil))
atExit exit_handler((nil))
atExit twdShutdown((nil))
atExit syncShutdown((nil))
atExit dbndShutdown((nil))
atExit arrShutdown((nil))
atExit errlogExitHandler((nil))
atExit rlExit((nil))
atExit ClockTime_Shutdown((nil))

Hopefully that will shed some light on the problem.

@LeeYangLBLBCS
Copy link
Author

LeeYangLBLBCS commented May 10, 2023 via email

@kmpeters
Copy link
Contributor

kmpeters commented May 10, 2023

I'm able to recreate the problem using the example IOC, loading only the ESP300 support even though I have nothing connected:

epicsEventDestroy: pthread_mutex_destroy failed: Device or resource busy
^C
Thread 1 "newport" received signal SIGINT, Interrupt.
0x00007ffff77cf335 in pthread_cond_destroy@@GLIBC_2.3.2 () from /lib64/libc.so.6
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.34-40.el9_1.1.x86_64 keyutils-libs-1.6.1-4.el9.x86_64 krb5-libs-1.19.1-24.el9_1.x86_64 libcom_err-1.46.5-3.el9.x86_64 libgcc-11.3.1-2.1.el9.x86_64 libselinux-3.4-3.el9.x86_64 libstdc++-11.3.1-2.1.el9.x86_64 libtirpc-1.3.3-0.el9.x86_64 pcre2-10.40-2.el9.x86_64 zlib-1.2.11-35.el9_1.x86_64
(gdb) bt
#0  0x00007ffff77cf335 in pthread_cond_destroy@@GLIBC_2.3.2 () from /lib64/libc.so.6
#1  0x00007ffff7cbac07 in epicsEventDestroy (pevent=0x423730) at ../osi/os/posix/osdEvent.c:76
#2  0x00007ffff7cb3b9c in epicsEvent::~epicsEvent (this=<optimized out>, __in_chrg=<optimized out>) at ../osi/epicsEvent.cpp:52
#3  0x00007ffff7788947 in __cxa_finalize () from /lib64/libc.so.6
#4  0x00007ffff7f39ae7 in __do_global_dtors_aux () from /net/aquila/export/APSmaster/gateway/epics/synApps_6_2_1/support/motor-R7-2-2/lib/rhel9-x86_64-debug/libNewport.so
#5  0x00007fffffffd3d0 in ?? ()
#6  0x00007ffff7fd0f5e in _dl_fini () at dl-fini.c:142
Backtrace stopped: frame did not save the PC
(gdb) 

It's still not obvious to me where the problem is.

@kmpeters
Copy link
Contributor

The error still occurs if the ESP300.substitutions file isn't loaded and the ESP300Config call is commented out.

@kmpeters
Copy link
Contributor

The error message also appears in this tech-talk message:

https://epics.anl.gov/tech-talk/2018/msg01101.php

@kmpeters
Copy link
Contributor

The error occurs as soon as the errlog thread exits.

epics> ^C
Thread 1 "newport" received signal SIGINT, Interrupt.
0x00007ffff786f8fc in read () from /lib64/libc.so.6
(gdb) info thread
  Id   Target Id                                          Frame 
* 1    Thread 0x7ffff7069500 (LWP 1026833) "newport"      0x00007ffff786f8fc in read () from /lib64/libc.so.6
  3    Thread 0x7ffff6864640 (LWP 1026838) "errlog"       0x00007ffff77cd39a in __futex_abstimed_wait_common () from /lib64/libc.so.6
  4    Thread 0x7ffff67e3640 (LWP 1026921) "timerQueue"   0x00007ffff77cd39a in __futex_abstimed_wait_common () from /lib64/libc.so.6
  5    Thread 0x7ffff66e2640 (LWP 1026922) "timerQueue"   0x00007ffff77cd39a in __futex_abstimed_wait_common () from /lib64/libc.so.6
  6    Thread 0x7ffff65e1640 (LWP 1026923) "serial4"      0x00007ffff77cd39a in __futex_abstimed_wait_common () from /lib64/libc.so.6
  7    Thread 0x7ffff64e0640 (LWP 1026924) "taskwd"       0x00007ffff77cd39a in __futex_abstimed_wait_common () from /lib64/libc.so.6
  8    Thread 0x7ffff645f640 (LWP 1026926) "timerQueue"   0x00007ffff77cd39a in __futex_abstimed_wait_common () from /lib64/libc.so.6
  9    Thread 0x7ffff7065640 (LWP 1026927) "cbLow"        0x00007ffff77cd39a in __futex_abstimed_wait_common () from /lib64/libc.so.6
  10   Thread 0x7ffff635e640 (LWP 1026928) "cbMedium"     0x00007ffff77cd39a in __futex_abstimed_wait_common () from /lib64/libc.so.6
  11   Thread 0x7ffff615d640 (LWP 1026929) "cbHigh"       0x00007ffff77cd39a in __futex_abstimed_wait_common () from /lib64/libc.so.6
  12   Thread 0x7ffff5f5c640 (LWP 1026930) "dbCaLink"     0x00007ffff77cd39a in __futex_abstimed_wait_common () from /lib64/libc.so.6
  13   Thread 0x7ffff5d5b640 (LWP 1026931) "ESP300_motor" 0x00007ffff77cd39a in __futex_abstimed_wait_common () from /lib64/libc.so.6
  14   Thread 0x7ffff5c5a640 (LWP 1026932) "scanOnce"     0x00007ffff77cd39a in __futex_abstimed_wait_common () from /lib64/libc.so.6
  15   Thread 0x7ffff5a59640 (LWP 1026933) "scan-10"      0x00007ffff77cd39a in __futex_abstimed_wait_common () from /lib64/libc.so.6
  16   Thread 0x7ffff5858640 (LWP 1026934) "scan-5"       0x00007ffff77cd39a in __futex_abstimed_wait_common () from /lib64/libc.so.6
  17   Thread 0x7ffff5657640 (LWP 1026935) "scan-2"       0x00007ffff77cd39a in __futex_abstimed_wait_common () from /lib64/libc.so.6
  18   Thread 0x7ffff5456640 (LWP 1026936) "scan-1"       0x00007ffff77cd39a in __futex_abstimed_wait_common () from /lib64/libc.so.6
  19   Thread 0x7ffff5255640 (LWP 1026937) "scan-0.5"     0x00007ffff77cd39a in __futex_abstimed_wait_common () from /lib64/libc.so.6
  20   Thread 0x7ffff5054640 (LWP 1026938) "scan-0.2"     0x00007ffff77cd39a in __futex_abstimed_wait_common () from /lib64/libc.so.6
  21   Thread 0x7ffff4e53640 (LWP 1026939) "scan-0.1"     0x00007ffff77cd39a in __futex_abstimed_wait_common () from /lib64/libc.so.6
  22   Thread 0x7ffff4c52640 (LWP 1026940) "CAS-TCP"      0x00007ffff788029f in accept () from /lib64/libc.so.6
  23   Thread 0x7ffff4b51640 (LWP 1026941) "CAS-UDP"      0x00007ffff7880504 in recvfrom () from /lib64/libc.so.6
  24   Thread 0x7ffff4a50640 (LWP 1026942) "CAS-beacon"   0x00007ffff7844845 in clock_nanosleep@GLIBC_2.2.5 () from /lib64/libc.so.6
(gdb) 
(gdb) cont
Continuing.

epics> 
epics> exit
[Switching to Thread 0x7ffff5f5c640 (LWP 1026930)] <----- dbCaLink

Thread 12 "dbCaLink" hit Breakpoint 1, epicsMutexDestroy (pmutexNode=0x7fffd8000dc0) at ../osi/epicsMutex.cpp:129
(gdb) cont
Continuing.

Thread 12 "dbCaLink" hit Breakpoint 1, epicsMutexDestroy (pmutexNode=0x7fffd8000d50) at ../osi/epicsMutex.cpp:129
(gdb) cont
Continuing.
[Thread 0x7ffff5f5c640 (LWP 1026930) exited] <----- dbCaLink
[Thread 0x7ffff64e0640 (LWP 1026924) exited] <----- taskwd
[Switching to Thread 0x7ffff7069500 (LWP 1026833)] <----- newport

Thread 1 "newport" hit Breakpoint 1, epicsMutexDestroy (pmutexNode=0x436990) at ../osi/epicsMutex.cpp:129
(gdb) cont
Continuing.

Thread 1 "newport" hit Breakpoint 1, epicsMutexDestroy (pmutexNode=0x4363e0) at ../osi/epicsMutex.cpp:129
(gdb) cont
Continuing.

Thread 1 "newport" hit Breakpoint 1, epicsMutexDestroy (pmutexNode=0x430c10) at ../osi/epicsMutex.cpp:129
(gdb) cont
Continuing.
[Thread 0x7ffff6864640 (LWP 1026838) exited] <----- errlog
epicsEventDestroy: pthread_mutex_destroy failed: Device or resource busy

@kmpeters
Copy link
Contributor

The ESP300_motor thread is still running after the error occurs:

^C
Thread 1 "newport" received signal SIGINT, Interrupt.
0x00007ffff77cf335 in pthread_cond_destroy@@GLIBC_2.3.2 () from /lib64/libc.so.6
(gdb) info break
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0x00007ffff7cb3540 in epicsMutexDestroy(epicsMutexId) at ../osi/epicsMutex.cpp:129
	breakpoint already hit 6 times
2       breakpoint     keep y   0x00007ffff7ca2fc0 in errlogMessage at ../error/errlog.c:183
(gdb) 
(gdb) info thread
  Id   Target Id                                          Frame 
* 1    Thread 0x7ffff7069500 (LWP 1034897) "newport"      0x00007ffff77cf335 in pthread_cond_destroy@@GLIBC_2.3.2 () from /lib64/libc.so.6
  4    Thread 0x7ffff67e3640 (LWP 1034942) "timerQueue"   0x00007ffff77cd39a in __futex_abstimed_wait_common () from /lib64/libc.so.6
  5    Thread 0x7ffff66e2640 (LWP 1034943) "timerQueue"   0x00007ffff77cd39a in __futex_abstimed_wait_common () from /lib64/libc.so.6
  6    Thread 0x7ffff65e1640 (LWP 1034944) "serial4"      0x00007ffff77cd39a in __futex_abstimed_wait_common () from /lib64/libc.so.6
  8    Thread 0x7ffff645f640 (LWP 1034946) "timerQueue"   0x00007ffff77cd39a in __futex_abstimed_wait_common () from /lib64/libc.so.6
  9    Thread 0x7ffff7065640 (LWP 1034947) "cbLow"        0x00007ffff77cd39a in __futex_abstimed_wait_common () from /lib64/libc.so.6
  10   Thread 0x7ffff635e640 (LWP 1034948) "cbMedium"     0x00007ffff77cd39a in __futex_abstimed_wait_common () from /lib64/libc.so.6
  11   Thread 0x7ffff615d640 (LWP 1034949) "cbHigh"       0x00007ffff77cd39a in __futex_abstimed_wait_common () from /lib64/libc.so.6
  13   Thread 0x7ffff5d5b640 (LWP 1034951) "ESP300_motor" 0x00007ffff77cd39a in __futex_abstimed_wait_common () from /lib64/libc.so.6
  14   Thread 0x7ffff5c5a640 (LWP 1034952) "scanOnce"     0x00007ffff77cd39a in __futex_abstimed_wait_common () from /lib64/libc.so.6
  15   Thread 0x7ffff5a59640 (LWP 1034953) "scan-10"      0x00007ffff77cd39a in __futex_abstimed_wait_common () from /lib64/libc.so.6
  16   Thread 0x7ffff5858640 (LWP 1034954) "scan-5"       0x00007ffff77cd39a in __futex_abstimed_wait_common () from /lib64/libc.so.6
  17   Thread 0x7ffff5657640 (LWP 1034955) "scan-2"       0x00007ffff77cd39a in __futex_abstimed_wait_common () from /lib64/libc.so.6
  18   Thread 0x7ffff5456640 (LWP 1034956) "scan-1"       0x00007ffff77cd39a in __futex_abstimed_wait_common () from /lib64/libc.so.6
  19   Thread 0x7ffff5255640 (LWP 1034957) "scan-0.5"     0x00007ffff77cd39a in __futex_abstimed_wait_common () from /lib64/libc.so.6
  20   Thread 0x7ffff5054640 (LWP 1034958) "scan-0.2"     0x00007ffff77cd39a in __futex_abstimed_wait_common () from /lib64/libc.so.6
  21   Thread 0x7ffff4e53640 (LWP 1034959) "scan-0.1"     0x00007ffff77cd39a in __futex_abstimed_wait_common () from /lib64/libc.so.6
  22   Thread 0x7ffff4c52640 (LWP 1034961) "CAS-TCP"      0x00007ffff788029f in accept () from /lib64/libc.so.6
  23   Thread 0x7ffff4b51640 (LWP 1034962) "CAS-UDP"      0x00007ffff7880504 in recvfrom () from /lib64/libc.so.6
  24   Thread 0x7ffff4a50640 (LWP 1034963) "CAS-beacon"   0x00007ffff7844845 in clock_nanosleep@GLIBC_2.2.5 () from /lib64/libc.so.6
(gdb) thread 13
[Switching to thread 13 (Thread 0x7ffff5d5b640 (LWP 1034951))]
#0  0x00007ffff77cd39a in __futex_abstimed_wait_common () from /lib64/libc.so.6
(gdb) list
124	    return(id );
125	}
126	
127	void epicsStdCall epicsMutexDestroy(epicsMutexId pmutexNode)
128	{
129	    epicsMutexLockStatus lockStat =
130	        epicsMutexOsdLock(epicsMutexGlobalLock);
131	    assert ( lockStat == epicsMutexLockOK );
132	    ellDelete(&mutexList,&pmutexNode->node);
133	    epicsMutexOsdDestroy(pmutexNode->id);
(gdb) bt
#0  0x00007ffff77cd39a in __futex_abstimed_wait_common () from /lib64/libc.so.6
#1  0x00007ffff77cfea4 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libc.so.6
#2  0x00007ffff7cbae5d in epicsEventWaitWithTimeout (pevent=0x423730, timeout=<optimized out>) at ../osi/os/posix/osdEvent.c:130
#3  0x00007ffff7cb3bfc in epicsEvent::wait (this=<optimized out>, timeOut=<optimized out>) at ../osi/epicsEvent.cpp:75
#4  0x00007ffff7f0b7be in motor_task (args=<optimized out>) at ../motordrvCom.cc:180
#5  0x00007ffff7cb8013 in start_routine (arg=0x4e6a90) at ../osi/os/posix/osdThread.c:411
#6  0x00007ffff77d0802 in start_thread () from /lib64/libc.so.6
#7  0x00007ffff7770450 in clone3 () from /lib64/libc.so.6
(gdb) frame 4
#4  0x00007ffff7f0b7be in motor_task (args=<optimized out>) at ../motordrvCom.cc:180
(gdb) list
175	        }
176	
177	        Debug(5, "motor_task: wait_time = %f\n", wait_time);
178	
179	        if (wait_time != 0.0)
180	            sem_ret = tabptr->semptr->wait(wait_time);
181	        previous_time = epicsTime::getCurrent();
182	
183	        if (*tabptr->any_inmotion_ptr)
184	        {
(gdb) 

@kmpeters
Copy link
Contributor

kmpeters commented May 10, 2023

I'm failing to see the connection between the ESP300_motor thread and the errlog thread. The common dev/drv code in the motor module (model-1) doesn't call errlog functions directly. The ESP300 driver does call errlogMessage and errlogPrintf:

$ grep errlog *ESP300*
devESP300.cc:#include <errlog.h>
devESP300.cc:   errlogMessage("ESP300_build_trans(): buffer overflow.\n");
drvESP300.cc:#include <errlog.h>
drvESP300.cc:        errlogMessage("drvESP300:send_mess(); message size violation.\n");
drvESP300.cc:        errlogPrintf("drvESP300:send_mess() - invalid card #%d\n", card);
drvESP300.cc:        errlogPrintf("drvESP300:send_mess() - invalid argument = %s\n", name);
drvESP300.cc:            errlogPrintf("%s connect error = %d\n", errmsg, (int) success_rtn);
drvESP300.cc:            errlogPrintf("%s setInputEos error = %d\n", errmsg, (int) success_rtn);

But there shouldn't be any messages sent to the controller (the ESP300Config was never called and motor records were not loaded).

@kmpeters
Copy link
Contributor

kmpeters commented May 10, 2023

The problem can be recreated using example IOC from motorNewport and the following st.cmd:

#!../../bin/linux-x86_64/newport

< envPaths

## Register all support components
dbLoadDatabase "../../dbd/newport.dbd"
newport_registerRecordDeviceDriver pdbbase

ESP300Setup(1, 10)

iocInit

# Boot complete

And then typing "exit"

@kmpeters
Copy link
Contributor

It is unsurprising that taskwd exits nicely before the error occurs, as it is designed to exit:

https://github.com/epics-base/epics-base/blob/7.0/modules/libcom/src/taskwd/taskwd.c#L89-L129

Whereas motor_task runs forever:

https://github.com/epics-modules/motor/blob/master/motorApp/MotorSrc/motordrvCom.cc#L131-L198

@kmpeters
Copy link
Contributor

kmpeters commented May 11, 2023

I was able to recreate the epicsEventDestroy: pthread_mutex_destroy failed: Device or resource busy error using the New Focus example IOC. I'm fairly confident this is a general issue.

@kmpeters
Copy link
Contributor

I'll keep this issue open until epics-modules/motor#199 is resolved, but I'll post progress to the motor issue going forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants