-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v2.4.0 grub error out of memory #1842
Comments
same for kairos-standard-opensuse-tumbleweed-amd64-generic-v2.4.0-k3sv1.26.6+k3s1.iso |
umm, this could be related to the gfx set by grub, you may need to set it to lower manually as we now set the gfxterm teminal to auto and it would try to get the highest mode available. Maybe you can check with different gfxmode values? |
seem like elementary also hit this at one point, which seems to confirm that this is a gfx issue, setting a really high gfx setting but the framebuffer is not big enough to display that: elementary/installer#542 |
@Itxaka any news on this, any chance to be fixed in 2.4.1? |
@Ognian unfortunately no. As this requires a change to grub default values, we needed to push 2.4.1 to fix some issues before getting to work into this as it requires extensive testing to find a good default. |
Wait, so this means you are able to boot by manually setting the gfxmode rigth? But then on reboot it ignores it unless you set it manually? Seems like we need to look for a safe default for the resolution Those are just warnings being exposed. It happened before but we were not logging them properly, it should not affect that much, is just nicer to have those fonts bundled :) |
I'll describe the process from the beginning:
the grub.cfg on the USB stick is much shorter than the one written by the installer on the eMMC (= sd card). the grub configuration on the usb stick always works the one on the sd card never. I tried to modify the one on the sd card by inserting |
Also faced |
Enabling debugging with |
yep, this makes sense. Our grub.cfg for livecd does not have the gfxmode set, so it makes sense that on livecd/usb/live mode you do not hit this, its only once you restart from the installed system, then you hit this issue as we set the Let me test this somehow. Maybe I can make virtualbox reproduce it by setting the video card to a very low amount of ram or something similar.... |
very weird, 4Gb of ram should be more than enough for everything to load with no issues, after all the kernel and initrd cant be more than 200Mb in any of the flavors.... Wondering if its due to the modules or the gfx stuff in your case as well.... |
So I disabled TPM from BIOS (Thanks @AndreyNikiforov !) |
Some comments found going trougth teh grub bugtracker:
Looks like TPM module is indeed involved! rhboot/grub2#102
So it makes sense that disabling tpm makes it work as it doesnt try to fully load the initrd into memory for measure. So it seems to be a mix of several things:
HAve to think about this and check further in upstream grubs to see if this has been fixed somewhere but good catch folks. Thanks @Ognian for reporting this and @AndreyNikiforov for the hint with the TPM. This would have been a nigthmare to track down otherwise! |
our kernel on core images is around 13Mb It kind of makes sense that we go over that mentioned 100Mb by setting the gfx mode to auto if it choses a very high resolution.... |
By moving to compressing the initramfs with zstd it would gain us 4 extra Mb, which is not much, but its good enough to breathe I guess @Ognian does this happen with a non-k3s build? If it also happens, are you able to build a custom image with the --zstd flag on initrd creation to see if it alleviates the issue? The patch is as follows, its just 1 line: diff --git a/Earthfile b/Earthfile
index b22b8c8..61eb545 100644
--- a/Earthfile
+++ b/Earthfile
@@ -441,7 +441,7 @@ base-image:
IF [ -e "/usr/bin/dracut" ]
# Regenerate initrd if necessary
RUN --no-cache kernel=$(ls /lib/modules | head -n1) && depmod -a "${kernel}"
- RUN --no-cache kernel=$(ls /lib/modules | head -n1) && dracut -f "/boot/initrd-${kernel}" "${kernel}" && ln -sf "initrd-${kernel}" /boot/initrd
+ RUN --no-cache kernel=$(ls /lib/modules | head -n1) && dracut --zstd -f "/boot/initrd${kernel}" "${kernel}" && ln -sf "initrd-${kernel}" /boot/initrd
END
END And then simply run |
umm booting from master in 4k doesnt result in the issue being reproduced, even with tpm. Im wondering if its a tpm implementation issue rather than a grub one. We dont ship the tpm module with grub as a module so not sure if its integrated into grub directly. I think we need to rework the grub.cfg to not load the gfxterm for now unless its needed as its giving us a lot of headaches. |
We dropped gfxterm here: kairos-io/packages#473 . Please give it a try if the problem still occurs feel free to re-open. |
I'm running into the same problem using Edit: Disabling TPM and reinstalling gave me the same results as @Ognian (can't find regexp, boots after pressing a key). Anyway I'd definitely like to see this issue resolved (ideally without disabling TPM) so let me know if there's anything I can do to help. |
Up to now it seems that to reproduce this issue one needs:
and we still miss something because @Itxaka tried the above combination and couldn't reproduce. His test was on qemu with virtual monitors though so maybe that's the reason (but grub thought the resolution was 4k) |
I've looked for a way to disable TPM on the Surface Pro, but I don't think that is an available setting in its boot menu. What's the best way to test setting the gfxmode to a lower resolution in Kairos? |
I would try this (warning: not tested):
Hopefully that should set the gfxmode on the installed system's grub. You can ofcourse check, after installation by editing the grub menu again and looking for that option. |
I know you said to use the live CD but I rebooted a node and tried running |
I noticed that Unfortunately, lowering the resolution didn't work for me either =/ |
@santhoshdaivajna sent me on Slack that they are seeing the same issue on Intel NUC with 8 cpu/32G mem/>500G disk . We may be able to get access to a NUC to debug. |
this reminds me https://bugs.launchpad.net/oem-priority/+bug/1842320/comments/125 - did we tried setting up gfxmode to 640x480 ? |
maybe it's just the GRUB version causing issues here? @Ognian is that new to 2.4? we could cross check the GRUB versions to see if that's causing it |
we think that the tumbleweed grub efi binary is the responsible of this and have reverted the change to use the leap one on kairos-io/packages#553 |
@Itxaka Thanks for looking into this! Will this also help the ubuntu flavors, or is this specific to opensuse? |
Should be for all, as we use the same grub artifacts for all of them |
Yes this was new with
yes, this was newly introduced with 2.4. Hope this helps. |
Unfortunately, the Surface Pro 7+ doesn't allow TPM disable 😕 Is my next
option switch dracut to `hostonly=yes` maybe? @Ognian are you running the
grub2-install inside the the new container there?
Thanks!
…On Sat, Dec 2, 2023, 11:26 AM Ognian ***@***.***> wrote:
Yes this was new with
maybe it's just the GRUB version causing issues here? @Ognian
<https://github.com/Ognian> is that new to 2.4? we could cross check the
GRUB versions to see if that's causing it
yes, this was newly introduced with 2.4.
I just tested with 2.4.1 and upgraded to 2.4.2 and the result is with
2.4.2 as it was with 2.4.1 and 2.4.0:
with TPM -> out of memory error; without TPM -> boots OK
The last version I tested where it worked was v2.2.1
The version I have now is:
KAIROS_PRETTY_NAME="kairos-standard-opensuse-leap-15.5
v2.4.2-k3sv1.28.2+k3s1"
and
sudo grub2-install --version
grub2-install (GRUB2) 2.06
Hope this helps.
Ognian
—
Reply to this email directly, view it on GitHub
<#1842 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFOOWNDV4DB34L3HTYKIH3YHNQDHAVCNFSM6AAAAAA5BXH3WOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZXGIYDQNZSGA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Yes |
@mudler I've tried gfxmode=640x480x32 and gfxpayload=640x480x32, but unfortunately it didn't alleviate the OOM errors. I've also tried building from source with @Itxaka recommendation of zstd, which also wasn't enough apparently; however, on my builds from source + Auroraboot do not seem to change the resolution like when I adjust grub settings via cloud_init like it does with official Kairos images. So, maybe a combination will work if I can get the source builds working 🤔 |
Just tested @alexander-bauer 's workaround of |
@alexander-bauer I found an option that is a bit more robust to remove the tpm module from the Create a Dockerfile:Pick your favorite Kairos image (e.g., ubuntu:20.04).
Build the image:
Deploy with auroraboot:For example, generate an ISO: docker run --rm -ti \
-v /var/run/docker.sock:/var/run/docker.sock \
-v $(pwd)/config.yaml:/config.yaml \
-v $(pwd)/build:/tmp/auroraboot \
quay.io/kairos/auroraboot \
--set "container_image=docker://tpm2workaround" \
--set "disable_http_server=true" \
--set "disable_netboot=true" \
--set "state_dir=/tmp/auroraboot" \
--cloud-config /config.yaml @Itxaka or @mudler might know of an easier way to override this using one of the cloud-init stages, I tried Hope that helps until we get a more permanent fix! |
Could also try the rc3 that we released yesterday to see if it fixes it, as we reverted the grub.efi to a different one which used to work! |
I tested with |
After install from
kairos-standard-opensuse-leap-amd64-generic-v2.4.0-k3sv1.26.6+k3s1.iso
on/dev/mmcblk1
on a x86_64 (latte panda 3 d) I get immediately the following grub error:The text was updated successfully, but these errors were encountered: