cmdlib: bump supermin VM memory to 3G #2940

jlebon · 2022-06-23T12:56:16Z

Something has changed recently which causes us to hit the ENOMEM issue
more easily now:

Mid-term, we could rework the compose so that only the OCI archive is
pulled through 9p rather than a full pull-local. Long-term, the fix is
to stop using 9p.

But for now to unblock CI, let's just bump the VM memory to 3G which
should help.

jmarrero

lgtm

jlebon · 2022-06-23T12:58:37Z

Mid-term, we could rework the compose so that only the OCI archive is
pulled through 9p rather than a full pull-local.

Started working on a patch for this.

dustymabe · 2022-06-23T13:15:45Z

Mid-term, we could rework the compose so that only the OCI archive is
pulled through 9p rather than a full pull-local. Long-term, the fix is
to stop using 9p.

Could we attach the ociarchive file directly as a block device (read-only) with a fixed size to the VM and access it directly (i.e. cat/pipe it to stdin of a process that will extract it)?

jlebon · 2022-06-23T13:22:28Z

Mid-term, we could rework the compose so that only the OCI archive is
pulled through 9p rather than a full pull-local. Long-term, the fix is
to stop using 9p.

Could we attach the ociarchive file directly as a block device (read-only) with a fixed size to the VM and access it directly (i.e. cat/pipe it to stdin of a process that will extract it)?

You're talking about the image creation part, right? The issue we're hitting here is rpm-ostree compose time. Image creation actually already works that way (pushing the container only through 9p and extracting inside of there). The path I'm proposing is applying the same idea but in reverse for pulling out the OSTree commit.

Something has changed recently which causes us to hit the ENOMEM issue more easily now: openshift/os#594 (comment) Mid-term, we could rework the compose so that only the OCI archive is pulled through 9p rather than a full `pull-local`. Long-term, the fix is to stop using 9p. But for now to unblock CI, let's just bump the VM memory to 4G which should help.

jlebon · 2022-06-23T14:05:46Z

OK this is useful. Prow failed with ENOMEM, so clearly it's not just a matter of throwing more RAM at it. Unless this regression we're facing has really made 9p more memory hungry. Anyway, I bumped it to 4G now just to see if that works, but let's not merge this yet. I'll try to get the OCI archive-based patch working.

jlebon · 2022-06-23T14:35:04Z

golangci/golangci-lint info checking GitHub for latest tag
golangci/golangci-lint crit unable to find '' - use 'latest' or see https://github.com/golangci/golangci-lint/releases for details

Hmm, not sure what's going on here.

/retest

openshift-ci · 2022-06-23T14:35:33Z

@jlebon: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/images	`83ce8d9`	link	true	`/test images`
ci/prow/rhcos	`83ce8d9`	link	true	`/test rhcos`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

cgwalters · 2022-06-23T15:21:01Z

OK this is useful. Prow failed with ENOMEM, so clearly it's not just a matter of throwing more RAM at it.

Yeah I should have highlighted this comment more. It's not that we're running out of memory exactly, it's that 9p specifically is trying to do something broken. We're provoking that bug by doing lots of little files.

Adding more memory only papers over it in that it makes it less likely for the kernel to try reclaiming inodes.

Only real fixes are:

use virtiofs and hope that's better (I think it is)
Actually fix 9p in the kernel to stop trying to allocate more memory in a reclaim path
Stop doing lots of little files over 9p

jlebon · 2022-06-24T21:05:54Z

Closing in favour of #2946.

jmarrero previously approved these changes Jun 23, 2022

View reviewed changes

jlebon mentioned this pull request Jun 23, 2022

Composes sometimes hitting error: fstatat(<checksum>.filez): Cannot allocate memory or error: openat(<checksum>.filez): Invalid argument openshift/os#594

Closed

travier previously approved these changes Jun 23, 2022

View reviewed changes

jlebon enabled auto-merge (rebase) June 23, 2022 13:23

jlebon disabled auto-merge June 23, 2022 13:59

jlebon dismissed stale reviews from travier and jmarrero via 83ce8d9 June 23, 2022 14:05

jlebon force-pushed the pr/bump-supermin-mem branch from 3593869 to 83ce8d9 Compare June 23, 2022 14:05

jlebon closed this Jun 24, 2022

cgwalters mentioned this pull request Aug 30, 2022

OCPBUGS-595: overlay: Add rhcos-selinux-policy-upgrade.service openshift/os#962

Merged

jlebon deleted the pr/bump-supermin-mem branch April 24, 2023 01:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmdlib: bump supermin VM memory to 3G #2940

cmdlib: bump supermin VM memory to 3G #2940

jlebon commented Jun 23, 2022

jmarrero left a comment

jlebon commented Jun 23, 2022

dustymabe commented Jun 23, 2022 •

edited

Loading

jlebon commented Jun 23, 2022

jlebon commented Jun 23, 2022

jlebon commented Jun 23, 2022

openshift-ci bot commented Jun 23, 2022

cgwalters commented Jun 23, 2022

jlebon commented Jun 24, 2022

cmdlib: bump supermin VM memory to 3G #2940

cmdlib: bump supermin VM memory to 3G #2940

Conversation

jlebon commented Jun 23, 2022

jmarrero left a comment

Choose a reason for hiding this comment

jlebon commented Jun 23, 2022

dustymabe commented Jun 23, 2022 • edited Loading

jlebon commented Jun 23, 2022

jlebon commented Jun 23, 2022

jlebon commented Jun 23, 2022

openshift-ci bot commented Jun 23, 2022

cgwalters commented Jun 23, 2022

jlebon commented Jun 24, 2022

dustymabe commented Jun 23, 2022 •

edited

Loading