Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ubuntu Core Kernel Panics on Intel NUC: BUG: unable to handle page fault for address #3174

Open
benfrancis opened this issue Oct 17, 2024 · 6 comments
Labels
bug snap Issues relating to the snap package
Milestone

Comments

@benfrancis
Copy link
Member

Note: I don't think this is actually a WebThings Gateway bug, it's an issue with Ubuntu Core. I'm filing it here for now because I don't yet have an upstream issue to track.


I'm testing WebThings Gateway on an Intel NUC 11. Specifically the NUC 11 CMCR1ABA i3 (Austin Beach rugged chassis + Elk Bay compute module). I don't think this particular version of the Intel NUC is officially certified by Canonical, but it does run Ubuntu Core (ubuntu-core-22-amd64.img).

However, I've recently noticed that when I leave it running for long periods of time (e.g. overnight) Ubuntu Core freezes with what I think is a memory-related kernel panic.

I've attached screenshots (sorry for the photos but because the machine freezes it's difficult for me to capture the text) of two separate occasions when this happened, with slightly different errors:

BUG: unable to handle page fault for address: ffffffffc121f1c0
#PF: supervisor read access in kernel mode
#PF: error_code(0x0009) - reserved bit violation

PXL_20241011_091155155(3)

BUG: unable to handle page fault for address: fffffffffffffff8
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page

PXL_20241010_154119004(1)

My first thought was that it could be a hardware problem, but at least on a first pass memtest hasn't found any faults, see attached screenshot. (Update: No failures after 12 passes).

PXL_20241011_115433093 MP(1)

I've seen quite a few open bugs of this nature on Launchpad with no resolution so I've contacted Canonical for support.

@benfrancis benfrancis added bug snap Issues relating to the snap package labels Oct 17, 2024
@benfrancis benfrancis added this to the 2.0 milestone Oct 17, 2024
@benfrancis benfrancis changed the title Ubuntu Core Kernal Panics on Intel NUC: BUG: unable to handle page fault for address Ubuntu Core Kernel Panics on Intel NUC: BUG: unable to handle page fault for address Oct 17, 2024
@ogra1
Copy link
Contributor

ogra1 commented Nov 22, 2024

Could you please open a bug on launchpad against the linux package, this requires the kernel team to look at it ...

@benfrancis
Copy link
Member Author

benfrancis commented Nov 22, 2024

@ogra1 I've noticed quite a few open bugs of this nature on Launchpad with no resolution.

I'm happy to file another bug mentioning this specific hardware, but can you advise where best to file the bug in order to get it noticed?

Edit: Do you mean here? https://bugs.launchpad.net/ubuntu/+source/linux

BTW I noticed that Ubuntu Server 24.04 is much more stable on this hardware than Ubuntu Core 22 (it has stayed running for several days). I tried installing Ubuntu Core 24 to see whether that's more stable (hoping it uses a similar kernel to Ubuntu Server 24.04), but sadly the system froze after a few hours. I haven't been able to capture a useful error message on Ubuntu Core 24 yet.

@benfrancis
Copy link
Member Author

Looks like the same error on Ubuntu Core 24:
PXL_20241122_214758996

@benfrancis
Copy link
Member Author

benfrancis commented Dec 10, 2024

By way of an update, when I booted up the Intel NUC today after being switched off for a while, Ubuntu Core 24 automatically updated its kernel from 6.8.0-48-generic to 6.8.0-50-generic.

Shortly afterwards the system became unresponsive again, but this time showing what look like different errors:
PXL_20241210_154702685
PXL_20241210_155409119

Ubuntu Server 24.04 continues to be very stable on the same hardware* running the 6.8.0-49-generic kernel.


* One other difference I have noticed is that the Intel NUC running on Ubuntu Server is using an Intel Core i7 compute module whereas the one running Ubuntu Core is using the Intel Core i3 compute module. The hardware is otherwise the same. It's unfortunately tricky for me to swap them out to rule that out as the issue because the i7 box is currently being used as a production server.

@benfrancis
Copy link
Member Author

Ubuntu Core 24 just froze again, this time with seemingly different errors again. Unfortunately I have no idea what any of these errors mean so it's difficult to know what would be useful to file as a kernel bug.

PXL_20241210_172923580

@benfrancis
Copy link
Member Author

Another update:

I installed Ubuntu Server on the Intel Core i3 machine to try to test whether it's Ubuntu Core vs. Ubuntu Server that's the issue or Intel Core i3 vs. Intel Core i7 (not sure why I didn't think of that before!).

Interestingly I have been able to reproduce a kernel panic and a page fault on the Intel Core i3 machine running Ubuntu Server 24.04 with the 6.8.0-50-generic kernel.

Screenshot of crash 1 (kernel panic):
PXL_20241212_180046671

Screenshot of crash 2 (page fault):
PXL_20241212_182509210

So the question now is whether it's a hardware fault or a software fault specific to the Core i3 compute module.

P.S. I've now upgraded the Intel Core i7 box to the 6.8.0-50-generic kernel too, just to verify that it's not just the 6.8.0-49-generic kernel that works well on the i7 (which seems unlikely). So far it seems to be running smoothly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug snap Issues relating to the snap package
Projects
Status: Sprint Backlog
Development

No branches or pull requests

2 participants