Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Build Reproducibility for Core Admin Client #154

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

egbedo
Copy link

@egbedo egbedo commented Oct 15, 2024

Description of Changes
This pull request aims to address reproducibility issues identified in the core-admin-client builds for Qubes OS. The key changes involve normalizing timestamps and other environment-dependent values that impacted the checksum and metadata consistency between builds.

Modifications Made
Environment Variable for Timestamps
Added $SOURCE_DATE_EPOCH to replace direct date calls across several files:

Files Updated:
plugins/build_rpm/scripts/rpmbuildinfo
plugins/installer/Makefile
plugins/template/Makefile

Description:
Replaced all instances of direct date commands with $SOURCE_DATE_EPOCH to ensure the builds use a consistent timestamp. This prevents new timestamps from being generated each time the build runs.

RPM Metadata Consistency
Updates to RPM metadata generation scripts to ensure deterministic checksums:

Files Updated:
_plugins/source_rpm/scripts/generate-changelog
plugins/source_rpm/scripts/clamp-changelog-entry-dat_e

Description:
These changes reset the metadata timestamps to $SOURCE_DATE_EPOCH, ensuring identical metadata across builds.
File Permission Normalization

Files Updated:
plugins/build_rpm/scripts/rpmbuildinfo
Description:
Standardized file permission handling within the scripts to ensure permissions are consistent between builds.

Testing Results
After these changes, two separate builds were created and compared using diffoscope. The results confirmed:

Identical Checksums for files in the RPM packages.
Consistent Metadata without varying timestamps or permissions across builds.

@marmarek
Copy link
Member

Please don't add your temporary files (backup-rpm-build dirs, html files etc)

@egbedo egbedo force-pushed the reproducibility-fixes branch 2 times, most recently from e897d97 to 33552cf Compare October 16, 2024 10:23
Copy link

codecov bot commented Oct 16, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 78.62%. Comparing base (c302e71) to head (dfae88a).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #154   +/-   ##
=======================================
  Coverage   78.61%   78.62%           
=======================================
  Files          47       47           
  Lines        5382     5384    +2     
=======================================
+ Hits         4231     4233    +2     
  Misses       1151     1151           
Flag Coverage Δ
?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@marmarek marmarek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the few comments below, I'll also run full test suite (default one runs only subset of tests). It seems like template and iso builds are broken by this change, but lets check.

new_src_header.txt Outdated Show resolved Hide resolved
new_rpm_header.txt Outdated Show resolved Hide resolved
.gitignore Outdated Show resolved Hide resolved
qubesbuilder/plugins/installer/Makefile Outdated Show resolved Hide resolved
qubesbuilder/plugins/template/Makefile Outdated Show resolved Hide resolved
Comment on lines 33 to 38
# Check if SOURCE_DATE_EPOCH is set; if not, use the current date
if [ -z "$SOURCE_DATE_EPOCH" ]; then
CURRENT_DATE=$(date -u +"%Y-%m-%d %H:%M:%S")
else
CURRENT_DATE=$(date -u -d @"$SOURCE_DATE_EPOCH" +"%Y-%m-%d %H:%M:%S")
fi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is redundant now, you set CURRENT_DATE again below.

.gitignore Outdated
Comment on lines 11 to 13



Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't add empty lines either.

@marmarek
Copy link
Member

I've started full tests here: https://gitlab.com/QubesOS/qubes-builderv2/-/pipelines/1500114091 it will take few hours to complete

@@ -71,7 +71,7 @@ Version: $(rpm -qp --queryformat '%{version}-%{release}' "$SRPM")
Architecture: $(rpm -qp --queryformat '%{arch}' "$SRPM")
Binary: $(printf '%s' "$RPMS" | xargs rpm -qp --qf "%{name} ")
Build-Origin: $(getos)
Build-Date: $(date -R)
Build-Date: $(date -d @"$SOURCE_DATE_EPOCH" -R)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .buildinfo file is intended to document (to some reasonable extend) the build environment that was used for a specific build attempt. As currently implemented it is not limited to those parts of the build environment that we consider to be necessary to reproduce a build. As such recording a fake build date doesn't make sense.

If we want to move to only record parts that are necessary then the right action would be to drop this field completely. SOURCE_DATE_EPOCH is already recorded in the Environment section.

(I prefer the current approach to .buildinfo but other projects do things differently, so I'm open to arguments)

@@ -71,7 +71,8 @@ mkdir -p "dists/$SUITE/main/binary-amd64"
dpkg-scanpackages --multiversion . > "dists/$SUITE/main/binary-amd64/Packages"
gzip -9c "dists/$SUITE/main/binary-amd64/Packages" > "dists/$SUITE/main/binary-amd64/Packages.gz"

DATE=$(LC_ALL=C date -u +"%a, %d %b %Y %H:%M:%S %Z")
DATE=$(LC_ALL=C date -u -d "@${SOURCE_DATE_EPOCH:-$(date +%s)}" +"%a, %d %b %Y %H:%M:%S %Z")

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this local repo is only used internally by qubes-builderv2 so the included date shouldn't matter. If it should be made reproducible for some reason then this is not enough. For example the gzip above records the file modification time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only for generating on the fly local build repository.

@marmarek
Copy link
Member

Pipeline with all tests enabled: https://gitlab.com/QubesOS/qubes-builderv2/-/pipelines/1501302493

@egbedo egbedo force-pushed the reproducibility-fixes branch 14 times, most recently from 0bd1bb9 to 717642f Compare October 21, 2024 19:26
ISO_VERSION ?= $(shell date +%Y%m%d)
ISO_VERSION ?= $(shell date -d @"$(SOURCE_DATE_EPOCH)" +%Y%m%d 2>/dev/null || date +%Y%m%d)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ISO_VERSION is set already on the line just above. I think you wanted to replace that one, not add another one

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment than template applies here too. We cannot reproduce a given ISO or template based on timestamp. We don't hold any snapshot at a given point of time.

@@ -22,6 +22,9 @@

# Clamp the topmost changelog entry date ("Build for ...") to the previous one
# (actual meaningful entry)
#Reverted clamp-changelog-entry-date

CHANGELOG_PATH="$1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? It's set to the same value just below.

# PREVIOUS_DATE=$(grep '^ --' "$CHANGELOG_PATH" | head -n 2 | tail -n 1 | grep -o ' .*')

# Replace the topmost date with CURRENT_DATE
sed -e "0,/^ --/s/^\( --.*\)\( .*\)/\1$CURRENT_DATE/" -i "$CHANGELOG_PATH"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CURRENT_DATE is not set

I think what you need is to set CURRENT_DATE like this:

if [ -n "$SOURCE_DATE_EPOCH" ]; then
    CURRENT_DATE=$(date -R -u -d "@$SOURCE_DATE_EPOCH")
else
    CURRENT_DATE=$(grep '^ --' "$CHANGELOG_PATH" | head -n 2 | tail -n 1 | grep -o '  .*')
fi

@egbedo egbedo force-pushed the reproducibility-fixes branch from fbebd15 to d21c9ec Compare October 21, 2024 20:37
Final commit

Final commit

Final commit

Final commit

final commit

final one

made changes to clamp-changelog-entry-date

made more changes

removed the spurious files and made appropriate changes to scripts

edited script

edited script

retrieved .txt files

edited files

ammended the file

added files

added and removed files

Removed unnecessary files and updated .gitignore

edited the script

edited the script

Replaced date with 1727261746 for reproducibility

final changes

Last commit

final changes

shellcheck-error fixes

shellcheck change

fixed clamp-changelog-entry-date

Reverting to fixed-clamp-change

final changes

final changes

shellcheck-error fixes

shellcheck change

fixed clamp-changelog-entry-date

retrieved .txt files

retrieved the .txt files and gitignore

shellcheck-error fixes

shellcheck change

fixed clamp-changelog-entry-date

Reverting to fixed-clamp-change

final changes

final changes

shellcheck-error fixes

shellcheck change

fixed clamp-changelog-entry-date

retrieved .txt files

retrieved the .txt files and gitignore

Delete new_rpm_header.txt

Delete new_src_header.txt

Delete e -i HEAD~5

Update .gitignore

Delete qubesbuilder/.gitignore

Delete qubesbuilder/dependencies-debian.txt

Delete qubesbuilder/dependencies-fedora-qubes-executor.txt

Delete qubesbuilder/dependencies-fedora.txt

Delete e -i HEAD~5

Delete qubesbuilder/dependencies-debian.txt

Delete qubesbuilder/.gitignore

Delete qubesbuilder/dependencies-fedora-qubes-executor.txt

Delete qubesbuilder/dependencies-fedora.txt

Delete new_rpm_header.txt

Update modify-changelog-for-build

Update clamp-changelog-entry-date

Update clamp-changelog-entry-date

Update clamp-changelog-entry-date

Update clamp-changelog-entry-date

Update clamp-changelog-entry-date

Update Makefile

Update clamp-changelog-entry-date

Update clamp-changelog-entry-date
@egbedo egbedo force-pushed the reproducibility-fixes branch from d1cc15c to 936e43f Compare October 21, 2024 21:02
@@ -45,7 +45,11 @@ fix_up := $(shell TEMPLATE_NAME=$(TEMPLATE_NAME) TEMPLATE_LABEL="$(TEMPLATE_LABE
$(BUILDER_DIR)/scripts/builder-fix-filenames)

TEMPLATE_NAME := $(word 1,$(fix_up))
TEMPLATE_TIMESTAMP ?= $(shell date -u +%Y%m%d%H%MZ)

TEMPLATE_TIMESTAMP ?= $(shell date -u -d "@$(SOURCE_DATE_EPOCH)" +%Y%m%d%H%MZ 2>/dev/null || date -u +%Y%m%d%H%MZ)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not make sense here because we cannot reproduce from a given timestamp everything that follows after the build process. It's just a marker, not used at all as a reference.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now understand that the TEMPLATE_TIMESTAMP is just a marker and not critical for reproducibility, especially since it doesn't affect the rest of the build process. Given this, I'll revert the change and keep the original date logic.

else
# Get previous date from changelog
CURRENT_DATE=$(grep '^ --' "$CHANGELOG_PATH" | head -n 2 | tail -n 1 | grep -o ' .*')
fi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I don't think this is a good idea. The reference should be the date provided by the CHANGELOG which means that SOURCE_DATE_EPOCH should somehow match that on Debian side.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point about using the SOURCE_DATE_EPOCH as the reference for Debian reproducibility. I've look adjusted the script so that SOURCE_DATE_EPOCH is used more effectively, ensuring consistency with Debian's side rather than relying on the previous changelog entry date.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants