Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testmap: Enable fedora-39 for cockpit, disable fedora-37 #5171

Merged
merged 4 commits into from
Aug 30, 2023

Conversation

martinpitt
Copy link
Member

@martinpitt martinpitt commented Aug 29, 2023

I initially thought that we need to do the usual "adjust test special cases", but it seems we replaced them all with globs now 💪 So let's see how far they get!

Trigger command for iterating:

./tests-trigger 5171 fedora-39/{networking,storage,expensive,other}@cockpit-project/cockpit

@martinpitt martinpitt changed the title testmap: Enable fedora-39 for cockpit testmap: Enable fedora-39 for cockpit, disable fedora-37 Aug 29, 2023
@martinpitt
Copy link
Member Author

A large number of failures are due to missing netcat. Plus, a few naughties to copy.

@martinpitt
Copy link
Member Author

martinpitt commented Aug 29, 2023

OK, most of that works. There's three remaining storage failures which look a bit dubious to me, and I second-guess the claim that this is really known issue #5090. Perhaps testPoolResize is, but testResizeNTFS most likely isn't -- it seems fedora 39 can't mount "ntfs" type any more. I have a faint memory of lwn.net mentioning this, that ntfs got dropped from the kernel; that might be related. The pattern for 5090 is unfortunately very imprecise and catches too many unrelated errors.

@mvollmer can you please have a closer look a these two? Thanks!

@martinpitt
Copy link
Member Author

Ah, one more, this is also fallout from the FIPS breakage. I'll update the naughty.

@mvollmer
Copy link
Member

TestStorageStratis.testPoolResize does indeed look like #5090, but not TestStorageResize.testResizeNtfs. That seems to be missing support for NTFS in the kernel.

@martinpitt
Copy link
Member Author

I suppose I remembered https://lwn.net/Articles/866112/#ntfs . So that may indeed be an intended change.

@martinpitt
Copy link
Member Author

testRootReboot gets stuck during shutdown:

[  OK  ] Stopped target local-fs.target - Local File Systems.
         Unmounting boot-efi.mount - /boot/efi...
         Unmounting home.mount - /home...
         Unmounting new\x2droot.mount - /new-root...
         Unmounting run-stratisd-ns_mounts.mount - /run/stratisd/ns_mounts...
         Unmounting tmp.mount - Temporary Directory /tmp...
[  OK  ] Unmounted home.mount - /home.
[  OK  ] Unmounted run-stratisd-ns_mounts.mount - /run/stratisd/ns_mounts.
[  OK  ] Unmounted tmp.mount - Temporary Directory /tmp.
[  OK  ] Stopped target swap.target - Swaps.
         Deactivating swap dev-zram0.swap - Compressed Swap on /dev/zram0...
[  OK  ] Deactivated swap dev-zram0.swap - Compressed Swap on /dev/zram0.
         Stopping systemd-zram-setup@zram0.…vice - Create swap on /dev/zram0...
[   78.799019] zram0: detected capacity change from 3606528 to 0
[  OK  ] Stopped [email protected] - Create swap on /dev/zram0.
[  OK  ] Removed slice system-systemd\x2dzr…- Slice /system/systemd-zram-setup.
[   78.928668] EXT4-fs (dm-1): unmounting filesystem e7fa5069-3a9e-40da-bf36-a54dd0129231.
[  OK  ] Unmounted new\x2droot.mount - /new-root.
[*     ] Job boot-efi.mount/stop running (17s / no limit)

It's not clear from this what happens, so I applied

--- test/common/storagelib.py
+++ test/common/storagelib.py
@@ -20,7 +20,7 @@ import os.path
 import re
 import textwrap
 
-from testlib import Error, MachineCase, wait
+from testlib import Error, MachineCase, sit, wait
 
 
 def from_udisks_ascii(codepoints):
@@ -633,7 +633,10 @@ grub2-install {dev}
 grubby --update-kernel=ALL --args="root=UUID=$uuid rootflags=defaults rd.luks.uuid=$luks_uuid rd.lvm.lv=root/root"
 ! test -f /etc/kernel/cmdline || cp /etc/kernel/cmdline /new-root/etc/kernel/cmdline
 """, timeout=300)
-        m.spawn("dd if=/dev/zero of=/dev/vda bs=1M count=100; reboot", "reboot", check=False)
+        m.execute("dd if=/dev/zero of=/dev/vda bs=1M count=100")
+        m.execute("systemctl start debug-shell")
+        sit()
+        m.spawn("sleep 5; reboot", "reboot", check=False)
         m.wait_reboot(300)
         self.assertEqual(m.execute("findmnt -n -o SOURCE /").strip(), "/dev/mapper/root-root")

then attach virt-viewer, switch to VT 11 (the debug shell), and continue the test. Indeed systemctl list-jobs shows boot-efi.mount, but the journal doesn't show anything unusual, dmesg is empty, top shows no runaway process. There is no overflowing file system either. There aren't many non-kernel processes around any more, just udevd, journald, and three systemd-userworkd.

Curiously, /boot/efi is still mounted in /proc/self/mounts, but umount /boot/efi says "not mounted". This smells like the kernel lost track of itself?

Curiously/helpfully, this weird state already happens during the sit(), without the rebooting. So that whole /boot shiftery somehow destroys or overlays the /boot/efi mount. But what does work is to run umount /boot; umount /boot/efi, then rebooting works, and the test eventually succeeds.

@martinpitt
Copy link
Member Author

I sent a cockpit PR to fix testRootReboot, blocking on that. I also copied the testPoolResize half of the 5090 naughty. So the only remaining one is testResizeNtfs, I updated the todo list.

@martinpitt
Copy link
Member Author

testResizeNtfs is much simpler: While fedora-38 has ntfs-3g package pre-installed, fedora-39 does not any more. Installing it fixes it in the sense that it gets further, and then fails with

warning: Error resizing logical volume: Process reported exit code 5: File system device usage is not available from libblkid.

which is the other half of #5090. I'll add that remaining naughty here.

I'll send a bots PR to add ntfs-3g to the image, blocking on that. That should cover everything.

This also produces systemd-coredump traces. Add a minimal match to
ensure that the login page really fails due to the crash, not due to
some other reason. This applies to both testCryptoPolicies and
testInconsistentCryptoPolicy, so use a slightly funny fnmatch to catch
both.

Downstream report: https://bugzilla.redhat.com/show_bug.cgi?id=2235589
Known issue cockpit-project#5174
@martinpitt martinpitt marked this pull request as ready for review August 30, 2023 04:19
@martinpitt martinpitt requested a review from mvollmer August 30, 2023 04:19
@martinpitt martinpitt requested a review from tomasmatus August 30, 2023 04:19
Copy link
Member

@mvollmer mvollmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thänks!

@martinpitt martinpitt merged commit 27c69f0 into cockpit-project:main Aug 30, 2023
@martinpitt martinpitt deleted the cockpit-f39 branch August 30, 2023 06:17
martinpitt added a commit to martinpitt/cockpit that referenced this pull request Aug 30, 2023
Commit 47bc4c5 dropped F37 from releasing, and
cockpit-project/bots#5171 dropped it from CI.
martinpitt added a commit to cockpit-project/cockpit that referenced this pull request Aug 30, 2023
Commit 47bc4c5 dropped F37 from releasing, and
cockpit-project/bots#5171 dropped it from CI.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants