Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stack overflow crashes since update to 0.42 #327

Closed
mbuesch opened this issue Oct 22, 2023 · 3 comments
Closed

Stack overflow crashes since update to 0.42 #327

mbuesch opened this issue Oct 22, 2023 · 3 comments

Comments

@mbuesch
Copy link
Contributor

mbuesch commented Oct 22, 2023

Hi,

I have a strange problem with stack corruption since I upgraded from 0.41 to 0.42.
I can pinpoint the behavior it to the update step. Reverting back to 0.41 fixes it.

I upgraded esp-idf-hal from 0.41 to 0.42
and esp-idf-svc from 0.46 to 0.47
esp-idf-sys is 0.33 in both cases.

I get the following crash randomly after a couple of seconds or minutes of runtime of an otherwise correctly working application:

***ERROR*** A stack overflow in task main has been detected.             
                                                                                                    
                                                                                                    
Backtrace: 0x400825c6:0x3ffc05f0 0x40086291:0x3ffc0610 0x400883b2:0x3ffc0630 0x40087151:0x3ffc06b0 0x400884e0:0x3ffc06e0 0x40088492:0xa5a5a5a5 |<-CORRUPTED
0x400825c6 - panic_abort                                                                            
    at /.../.embuild/espressif/esp-idf/v5.1.1/components/esp_system/panic.c:452
0x3ffc05f0 - _bss_end                                                                               
    at ??:??                                                                                        
0x40086291 - esp_system_abort                                                                       
    at /.../.embuild/espressif/esp-idf/v5.1.1/components/esp_system/port/esp_system_chip.c:84
0x3ffc0610 - _bss_end                                                                               
    at ??:??                    
0x400883b2 - vApplicationStackOverflowHook
    at /.../.embuild/espressif/esp-idf/v5.1.1/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:581
0x3ffc0630 - _bss_end       
    at ??:??                                                                                                                                                                                             
0x40087151 - vTaskSwitchContext
    at /.../.embuild/espressif/esp-idf/v5.1.1/components/freertos/FreeRTOS-Kernel/tasks.c:3729
0x3ffc06b0 - _bss_end                    
    at ??:??                               
0x400884e0 - _frxt_dispatch                   
    at /.../.embuild/espressif/esp-idf/v5.1.1/components/freertos/FreeRTOS-Kernel/portable/xtensa/portasm.S:450
0x3ffc06e0 - _bss_end                                                                               
    at ??:??                                                                                        
0x40088492 - _frxt_int_exit                
    at /.../.embuild/espressif/esp-idf/v5.1.1/components/freertos/FreeRTOS-Kernel/portable/xtensa/portasm.S:245
0xa5a5a5a5 - _rtc_slow_reserved_end        
    at ??:??  

My application is built such as that:

  • I use the target xtensa-esp32-espidf.
  • main() sets up all things and then returns to the OS into the idle loop.
  • All work in done in several std::threads that are distributed across the two CPUs.
  • These threads wait on per-thread std::sync::Condvar for work to be done.
  • The Condvars are triggered from within one periodic EspTimerService with registered callback that triggers every 10 milliseconds. (Not all threads are triggered on every timer invocation. The CPU load is less than 50% on both cores)

I tried to double the configured stack size from 32k to 64k, which did not help. Therefore, I think it's not a simple too-small stack.

I'm currently unable to really pinpoint the problem to any part of my app. The app runs correctly for some time, until it suddenly crashes with the stack overflow.

Do you have any suggestion as to where to look?
How could upgrading esp-idf-hal and esp-idf-svc trigger such a behavior?

Thanks a lot for your help.

@mbuesch
Copy link
Contributor Author

mbuesch commented Oct 22, 2023

Ok, I think I found the root cause.

Although the error message talks about a stack overflow in the main task, it seems to be a stack overflow in one of the threads.
The stack usage seems to have been borderline full before the update and the update seems to have slightly increased stack usage.

I did now increase the thread stacks with the sdkconfig variable CONFIG_PTHREAD_TASK_STACK_SIZE_DEFAULT.
(The main task stack is controlled by CONFIG_ESP_MAIN_TASK_STACK_SIZE)

Now it doesn't crash anymore.

The error message talking about the main task made me search in the wrong direction.

@mbuesch mbuesch closed this as completed Oct 22, 2023
@github-project-automation github-project-automation bot moved this from Todo to Done in esp-rs Oct 22, 2023
@Vollbrecht
Copy link
Collaborator

Vollbrecht commented Oct 22, 2023

since you are using a lot of threads from your description you should think about defining the stack_size when building your threads. You can use https://doc.rust-lang.org/std/thread/struct.Builder.html#method.stack_size to configure your specific thread_size for any given thread you spawn, that is maybe a better approach as to just use a default stack_usage for every thread you are using.

With that said keep #233 in mind while using this

@mbuesch
Copy link
Contributor Author

mbuesch commented Oct 22, 2023

Hi @Vollbrecht,

thanks a lot for the suggestion.
Indeed, dynamically setting the thread stack size on spawn time is the better solution most of the time.
I think in my case the global configuration might be fine, because I don't actually spawn any small (sub-)threads. I only use a couple of long running main threads.

However, I will check your solution. It looks cleaner. Thanks for the #233 hint. It would certainly have bitten me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

2 participants