Scripts for managing Debian and RPM package repositories containing many files.
See concept, metadata, and presentations sections below.
- Overview
- Deliverables
- Demo
- Usage
- Prerequisites
- Concept
- Metadata
- Presentations
- Related
- See also
- Contributing
This repo contains scripts to generate repository metadata for Debian and RPM packages:
apt
-based distros (Debian or Ubuntu)
- Packages
- Packages.gz
- Release
- Release.gpg
dnf
-based distros (RHEL >= 8 or Fedora >= 29)
note: modules.yaml requires
genmodules.py
- repodata/repomd.xml
- repodata/repomd.xml.asc
- repodata/repomd.xml.key
- repodata/${sha256}-modules.yaml.gz
- repodata/${sha256}-primary.xml.gz
- repodata/${sha256}-primary.sqlite.bz2
- repodata/${sha256}-filelists.xml.gz
- repodata/${sha256}-filelists.sqlite.bz2
- repodata/${sha256}-other.xml.gz
- repodata/${sha256}-other.sqlite.bz2
yum
-based distros (RHEL == 7) andzypper
-based distros (openSUSE or SLES)
- repodata/repomd.xml
- repodata/repomd.xml.asc
- repodata/repomd.xml.key
- repodata/${sha256}-primary.xml.gz
- repodata/${sha256}-primary.sqlite.bz2
- repodata/${sha256}-filelists.xml.gz
- repodata/${sha256}-filelists.sqlite.bz2
- repodata/${sha256}-other.xml.gz
- repodata/${sha256}-other.sqlite.bz2
git clone https://github.com/NVIDIA/cuda-repo-management
cd cuda-repo-management
note: genmodules.py is needed for generating modularity streams for NVIDIA driver packages
wget https://raw.githubusercontent.com/NVIDIA/yum-packaging-precompiled-kmod/main/genmodules.py
# RPM repos
yum install createrepo_c python3
# Debian repos
yum install dpkg-dev
# OverlayFS
yum install e2fsprogs rsync
# Misc that should already be installed
yum install bash coreutils util-linux gawk sed findutils file gzip
Update multiples repos at once using OverlayFS to layer directories
./repo-overlay.sh --mirror=path/to/snapshot (--output=path/to/save) (--tempdir=path/to/workdir) path/to/repos
> ex: time ./repo-overlay.sh --mirror=/data/snapshot
Generate Debian package repository metadata using bash
and dpkg
./repo-debian.sh --mirror=path/to/snapshot --input=path/to/repos --repo=$distro/$arch
> ex: time ./repo-debian.sh --mirror=/data/snapshot --input=$HOME/repos --repo=ubuntu1804/x86_64
Generate RPM package repository metadata using createrepo_c
./repo-rpm.sh --mirror=path/to/snapshot --input=path/to/repos --repo=$distro/$arch
> ex: time ./repo-rpm.sh --mirror=/data/snapshot --input=$HOME/repos --repos=rhel8/sbsa
Download Debian or RPM packages from an existing repository
./repo-mirror.sh --distro=$distro --arch=$arch (--version=$version) (--url=$repository) (--dryrun)
> ex: ./repo-mirror.sh --output=/data/snapshot --distro=sles15 --arch=x86_64
Verify that metadata matches RPM and Debian packages (filesize and SHA256 checksum)
./repo-validate.sh --mirror=path/to/snapshot (--distro=$distro) (--arch=$arch) (--dryrun)
> ex: ./repo-validate.sh --mirror=/data/snapshot --distro=rhel8 --arch=x86_64
Package managers do not scan repository directories (often served via HTTP/HTTPS), so from apt
/dnf
/yum
/zypper
's perspective whether a package exists or not, is defined by pre-generated metadata manifests. These metadata files provide Debian or RPM package availability information, which is used to resolve dependencies and for the selected files, determine the URLs to download.
Adding new packages to a repository requires re-generating this metadata. Using tools such as apt-ftparchive
or createrepo_c
, the full set of packages must be present in a single directory. This presents logistical challenges that involve copying thousands of large files around. Moreover, it lacks an elegant "undo" mechanism for iterative software development processes such as CI/CD pipelines.
Union file-systems such as OverlayFS, allow temporary overlapping of directory hierarchies using mount points. These mounts can cross file-system boundaries and utilize copy-on-write (COW). In other words, non-destructive repository merges that can be rolled back with umount
without the need for redundant file copying.
The mount syntax is layered right-to-left, with read-only (RO) lower layers, and one read-write (RW) upper layer.
sudo mount -t overlay -o \
lowerdir=${layer3}:${layer2}:${layer1},\
upperdir=${layer4}/upper,\
workdir=${layer4}/workdir \
none \
/mnt/overlayfs
Generating repository metadata for thousands of packages is CPU intensive and thus can take a very long time. Rather than building from scratch each time, old metadata can be re-used to reduce the workload.
The repo-debian.sh
script scans each Debian package in the directory, recording the key-value pair of filename and size (in bytes), to a variable. Then it parses Packages.gz
for filename and size (in bytes) key-value pair and saves it to another variable. The two variables are compared using the comm
command, which eliminates the duplicate entries. The remaining packages are processed using a combination of dpkg -I
, du
, and md5sum
/sha1sum
/sha256sum
/sha512sum
.
The createrepo_c
command has an --update
parameter to utilize existing RPM repository metadata in the repodata/
directory. This prints "CACHE HIT" when cache is successfully applies and "metadata are obsolete" when it detects a package mismatch requiring regenerating this data.
A wrinkle when this is used with OverlayFS is the repodata/
directory will only be available in the lower layer, as indicated by the error message "invalid cross-device link". The solution is to create a "write" operation, by moving the directory and then passing --update-md-path
to specify the new location to scan for repodata/
.
A brief explanation of the format and contents of these repository metadata files.
The apt
/apt-get
package manager looks for ${repo}/InRelease
and ${repo}/Release
files as its entry-point. This contains a timestamp, repository flag options, and a list of package manifest files. For the latter filename, size in bytes, MD5sum, SHA1sum, and SHA256sum hashes are provided.
The package manager compares the information found in Release
with Packages
to determine if there is corruption or tampering. Additionally the Release
file is signed with a public-private GPG key pair and signature is detached as Release.gpg
. This effectively signs the entire Debian package repository and requires clients to import the public GPG key to validate authenticity.
Origin: <Organization>
Label: <Repository Name>
Architecture: <x86_64|ppc64el|sbsa|cross-linux-sbsa>
Date: $(date -R --utc)
MD5Sum:
$md5 $bytes Packages
$md5 $bytes Packages.gz
SHA1:
$sha1 $bytes Packages
$sha1 $bytes Packages.gz
SHA256:
$sha256 $bytes Packages
$sha256 $bytes Packages.gz
Acquire-By-Hash: <no|yes>
The Release
file references Packages
and a gzipped copy Packages.gz
Gzipped text file, containing blocks representing each package in the Debian repository. Blocks contain key-value pairs, such as Package:
, Version:
, and Depends:
. For multi-line values (common for Depends
and Description
) start line with a space. Blocks are separated with an empty line.
Package: $name
Version: ${version}-${revision}
Architecture: <amd64|ppc64el|sbsa|all>
Priority: optional
Section: <$repotype>/<$category>
Maintainer: $user <$email>
Installed-Size: $extract_bytes
Depends: $packageA (>= $versionA), $packageB (>= $versionB), $packageC (>= $versionC)
Filename: ./${name}_${version}-${revision}_${arch}.deb
Size: $download_bytes
MD5sum: $md5
SHA1: $sha1
SHA256: $sha256
SHA512: $sha512
Description: What this package is used for.
Prefix each additional line with a space. Longer description for search with `apt-cache`.
Separate each package block with an empty line.
Package: ${packageA}
Version: ${versionA}-1
...
Some gotchas apply, for example the order of the package blocks affects dependency resolution. Chronological order (oldest → newest) release is recommended. Therefore if repo-debian.sh
locates an existing Packages.gz
manifest, it appends new blocks to the end.
Additionally, there is a convention to the order of the key-values, as generated by dpkg
. Some unofficial package tools such as CMake
do not follow this convention, which in some cases has lead to undocumented behavior. Again repo-debian.sh
tries to follow this convention as closely as possible.
The yum
/dnf
and zypper
package managers look for a ${repo}/repodata/repomd.xml
file as its entry-point. This contains a list of package manifest files (SHA256-prefixed). Each <data>
XML tag contains the unique filename, size in bytes, timestamp, and SHA256sum hash (both compressed and extracted).
The package manager uses this XML file to determine the locations of primary.{xml.gz,sqlite.bz2}
, filelists.{xml.gz,sqlite.bz2}
, other.{xml.gz,sqlite.bz2}
, and optionally modules.yaml.gz
. To be more CDN friendly, these RPM metadata files are prefixed with their SHA256
hash, such that the filename is unique, each time its contents have been modified.
The repomd.xml
file is signed with a public-private GPG key pair and signature is detached as repomd.xml.asc
. Individual RPM packages must be signed but this is an additional layer to sign the RPM package repository metadata and requires clients to import the public GPG key (repomd.xml.key
) to validate authenticity.
<?xml version="1.0" encoding="UTF-8"?>
<repomd xmlns=".../metadata/repo" xmlns:rpm=".../metadata/rpm">
<revision>$epoch</revision>
<data type="primary">
<checksum type="sha256">$SHA256</checksum>
<open-checksum type="sha256">$extract_SHA256</open-checksum>
<location href="repodata/${SHA256}-primary.xml.gz"/>
<timestamp>$epoch</timestamp>
<size>$bytes</size>
<open-size>$extract_bytes</open-size>
</data>
<data type="filelists">...</data>
<data type="other">...</data>
<data type="primary_db">...</data>
<data type="filelists_db">...</data>
<data type="other_db">
<checksum type="sha256">$SHA256</checksum>
<open-checksum type="sha256">$extract_SHA256</open-checksum>
<location href="repodata/${SHA256}-other.sqlite.bz2"/>
<timestamp>$epoch</timestamp>
<size>$bytes</size>
<open-size>$extract_bytes</open-size>
<database_version>10</database_version>
</data>
<data type="modules">
<checksum type="sha256">$SHA256</checksum>
<open-checksum type="sha256">$extract_SHA256</open-checksum>
<location href="repodata/${SHA256}-modules.yaml.gz"/>
<timestamp>$epoch</timestamp>
<size>$bytes</size>
<open-size>$extract_bytes</open-size>
</data>
</repomd>
Gzipped XML file, containing <package>
tags representing each package in the RPM repository. Contains package name, version, description, dependencies, SHA256 checksum, timestamp, size in bytes, etc. It is recommended to use createrepo_c
to generate these files.
<?xml version="1.0" encoding="UTF-8"?>
<metadata xmlns=".../metadata/common" xmlns:rpm=".../metadata/rpm" packages="$pkgCount">
<package type="rpm">
<name>$name</name>
<arch>$arch</arch>
<version epoch="1" ver="$version" rel="$release"/>
<checksum type="sha256" pkgid="YES">$SHA256</checksum>
<summary>Short description</summary>
<description>Longer multi-line description</description>
<time file="$epoch" build="$epoch"/>
<size package="$bytes" installed="$bytes" archive="$bytes"/>
<location href="${name}-${version}-${release}.${arch}.rpm"/>
<format>
<rpm:provides>
<rpm:entry name="$SONAME()(64bit)"/>
</rpm:provides>
<rpm:requires>
<rpm:entry name="$packageA"/>
</rpm:requires>
</format>
</package>
<package type="rpm">
<name>$packageA</name>
...
</package>
</metadata>
Multi-document YAML text file, with one document per modularity stream. Each stream contains timestamp, unique context
identifier, list of package names available for that stream, and one or more modularity profiles. Streams are designated by branch (choose one) and profiles represent bundle use cases (install one or more sets). The genmodules.py
script generates this metadata file if any packages containing the string nvidia-driver
are present in the repository. It is then injected into repomd.xml
using modifyrepo_c
command.
document: modulemd
version: 2
data:
name: $module
stream: $stream
version: $(date +%Y%m%d%H%M%S)
context: $(echo $name $stream $version $distro | md5sum | cut -c -10)
arch: $arch
summary: Short description
description: >-
Long multi-line description.
artifacts:
rpms:
- $packageA-$epoch:$version-$release.$arch
- $packageB-$epoch:$version-$release.$arch
- $packageC-$epoch:$version-$release.$arch
profiles:
$profile:
description: Profile description
rpms:
- $packageA
- $packageB
- $packageC
...
---
document: modulemd
version: 2
data:
name: $module
stream: $streamB
Hi, PackagingCon attendees! Thank you very much, grateful for the opportunity to participate, please keep in touch 😄
See CONTRIBUTING.md