- Background
- Overview
- Pre-requisites
- The scripts
- Example
- Configuring the user on the host
- Configuring Jenkins
- Node specific modifications
This directory contains a set of configuration files and helper scripts to aid in setting up a Jenkins CI system to run the Kata Containers metrics tests inside Virtual Machines (VMs).
Ideally the metrics are run on bare metal machines, in order to achieve the most repeatable results. However, occasionally a Pull Request can have unwanted side effects that then require the server to be rebooted, partially reconfigured, or re-installed. Sometimes such a workflow is not practical. In that situation the Pull Request metrics runs can be isolated from the bare metal machine by executing a "clean" Virtual Machine on the system for each Pull Request.
The downside is that running inside a VM will likely introduce more "noise" into the test results.
The scripts in this directory help construct a Virtual Machine (QEMU qcow2) image
suitable for running the metrics tests. The scripts are based around the use of
libvirt and the virsh
tool.
Briefly, the scripts do the following:
- Create an RSA keypair for SSH interaction between the host machine and the VMs.
- Build a master VM image using
virt-install
and cloud-init. - Install all components needed for the metrics run into the VM image.
- Enable the master image to be cloned:
- to ensure a "clean" environment for each PR run.
- to preserve the master image from any corruption by a PR run.
- Provide scripts for use from Jenkins to launch a cloned VM with a Jenkins agent and to delete a cloned VM.
The scripts in this directory rely upon certain tools being available on the host system. The following lists those tools, and the Ubuntu packages or alternative installation methods that can be used to install them. Package names may be different for other distributions.
Dependency | Ubuntu package or method |
---|---|
go |
Install latest from golang.org |
jre |
package default-jre |
libvirt |
package libvirt-bin |
qemu-kvm |
package qemu-kvm |
virtinst |
package virtinst |
yq |
Install latest with go get from mikefarah/yq |
The scripts are numbered roughly in the order you might expect to run them. Some scripts might only be required to be run once for initialization. Some scripts might be required to recover or debug a master image. Some scripts are helpers for diagnostics and debug. Some scripts are designed to be invoked by Jenkins or a Jenkins helper script.
The majority of scripts require a single config dir
argument so they can locate
information about the VM setup or VM machine names. This enables the use of
multiple configuration directories to enable VMs of different
configuration setups (such as different distributions).
This script creates an RSA key pair using ssh-keygen
. This key pair is used to
ssh
between the host system and the VMs.
By default a single keypair is used for all VMs. You should not have to re-generate the keypair unless you accidentally delete them or have some other issue.
If you re-generate the keypair you will need to re-generate or patch any existing VMs to use the new keys.
This is the main master VM creation script. It uses virt-install
along with
the cloud-init
files found from the config_dir
config files.
cloud-init
runs in the VM, parallel with the first VM boot. You might see a
login prompt appear at the VM, but initialization is not complete until
you see a cloud-init
line similar to:
[ 123.456789] cloud-init[1234]: Cloud-init v. 18.2 finished at Tue, 29 May 2018 15:42:43 +000
At this point detach from the VM shell by pressing ^]
three times. This should drop you
back to your host system, but leaves the VM running for the next step.
This step completes the installation of the master VM by copying the final few files over and then shutting the master VM down.
At this point the master VM is fully initialized and shut down and you should not need to modify the master VM for this configuration further.
This script is mainly for testing or for invocation by the CI system. It clones the master VM image in readiness to run it.
This script is mainly for testing. It runs a VM clone (which should have been created with 4_clone_vm.sh
), and uses ssh
to login to it.
This script is designed to be launched by Jenkins. It clones and runs a VM from the
master image, copies the Jenkins agent.jar
file across to the image, and executes the
agent over SSH. See the section on configuring Jenkins for
details on how to integrate this script into the Jenkins config.
Note: This script is written to avoid generating or consuming data from the invoking terminal - Jenkins expects to communicate solely with the
agent.jar
. Other extraneous data generation or consumption can make the agent connection fail.
To aid with any potential debug, the majority of the code is invoked from a main()
function that has its input/output redirected to /dev/null
. To debug, remove this
indirection, but do not expect the agent.jar
to connect to Jenkins successfully.
This script will shut down and remove a clone VM and its relevant storage items. This script is expected to be invoked either by Jenkins upon agent disconnect, or by hand if testing VMs.
# Run this once ever normally, unless you need to renew your keys.
$ ./1_create_keys.sh
# Now start to build the master VM image.
$ ./2_build_baseimage.sh ubuntu16.04
# And wait for the cloud-init 'finished' line.
# Now press ^]^]^] to get back to the host, and leave the master VM running.
# And complete the master VM initialization.
$ ./3_complete_baseimage.sh ubuntu16.04
# At this point you have a master VM image installed, which you can view with virsh:
$ virsh list --all
Id Name State
----------------------------------------------------
- ubnt16.04_master shut off
# If you wish to hand-debug a VM at this point, then you might also want to:
$ ./4_clone_vm.sh ubuntu16.04
$ ./5_login_clone.sh ubuntu16.04
# You will now be SSH'd into the clone VM.
$ ./7_delete_clone.sh ubuntu16.04
# Otherwise, now you have a master VM set up you are ready to configure Jenkins.
A user needs to be chosen on the host system that will execute the VMs. That user
requires some configuration. The following steps document choosing and configuring
a user called jenkins
to execute the VMs on the host system.
Add the GOLANG binary directories to the $PATH
variable by adding the following
to .profile
.
# set PATH so it includes golang
if [ -d "/usr/local/go/bin" ] ; then
PATH="/usr/local/go/bin:$PATH"
fi
export GOPATH=${HOME}/go
# set PATH so it includes local go bin
if [ -d "${GOPATH}/bin" ] ; then
PATH="${GOPATH}/bin:$PATH"
fi
To launch VMs via virsh
, add the user to the required groups:
$ adduser jenkins libvirt
$ adduser jenkins libvirt-qemu
The scripts require the Jenkins agent.jar
file to be located in the ${HOME}/bin
directory of the user. This file can be obtained from your Jenkins master:
$ mkdir -p ${HOME}/bin
$ cd ${HOME}/bin
$ curl -LO http://jenkins.katacontainers.io/jnlpJars/agent.jar
Note: This agent file may change when you update your Jenkins master. This is rare, but if slave problems are encountered after a Jenkins master update, consider refreshing the
agent.jar
on your slave machine.
For Jenkins to be able to launch the slave VM/agents over SSH, it will need some form of SSH authentication method to be configured. Consult the Jenkins SSH slave plugin documentation for guidance on setting up authentication. Details on precise slave setup are below.
To use these scripts we manipulate the Jenkins build node configuration somewhat. Ideally, we would use the Jenkins Libvirt slaves plugin to manage our slave VMs, but upon testing, the "revert" to the base clean VM function does not perform as expected.
To that end, we manipulate the Jenkins ssh launch
pre/post command configuration to
actually launch and remove our VMs.
We also add a small manipulation to prevent Jenkins from trying to launch the agent.jar itself on the host system (as our scripts now handle that launch inside the VM).
To try and ensure we get a fresh, clean VM per-PR/build we also configure the nodes to be as short lived as we can. Again, we would prefer to have use something like the Single Use Slave Plugin - but that provides a single use slave node, which will not launch again.
Our configuration is not perfect (the shortest time we can set to kill off an inactive node is one minute). Because of this, if another job is in the queue or arrives within one minute of a previous job completing, it is scheduled to build on the same VM instance. The expectation is that we have enough gaps that we acquire sufficient new VMs to vastly reduce the "dirty node" instabilities we see with pure bare-metal builds.
In the event of getting a dirty VM node, executing a node offline/online
in the
Jenkins UI runs a VM shutdown/clone/run cycle. This gets the system back to having
a stable, clean VM on that node.
The following are examples of the configuration dialogs:
On your slave VM node host machine, create a pair of scripts specific to your host to allow Jenkins to invoke the sub-scripts. For example:
start_gitvm.sh
#!/bin/bash
ROOTDIR=${HOME}/kata-containers-ci/VMs/metrics
if [ $# -ne 1 ]; then
echo "Require VM config name as only parameter"
exit 1
fi
cd ${ROOTDIR}
./6_start_clone_agent.sh $1
stop_gitvm.sh
#!/bin/bash
ROOTDIR=${HOME}/kata-containers-ci/VMs/metrics
if [ $# -ne 1 ]; then
echo "Require VM config name as only parameter"
exit 1
fi
cd ${ROOTDIR}
./7_delete_clone.sh $1
Add those scripts to the Jenkins node agent launch dialog, along with an echo
hack
to prevent Jenkins from launching the agent jar itself, such as the following:
The passing of the config_dir
parameter to the scripts from the Jenkins node
config dialog. This allows us to set up separate nodes for different configurations
to test.
It is recommended that you only schedule a single VM build on each bare metal host at a time. This can be achieved through use of labels, resources, and locks in the Jenkins UI to enforce exclusivity and serialisation.
To try and get a clean VM clone for each PR, we set the node to take itself offline as soon as it is idle, which shuts down and removes the previous cloned VM.
Jenkins does not currently allow us to configure the Idle delay to less than one minute.
If you need to make node specific modifications such as:
- Modify the
checkmetrics
TOML file to take into account the performance of your node. - Add in site specific configs, such as PROXY settings.
- Modify paths, such as the
Jenkins
user HOME path or path to the scripts.
The recommended procedure for any of the previous modifications is to make a fork of this repository and keep your local modifications in a branch that tracks this main repository.
To configure proxies in the cloud-init user-data
file is complex, as we need
to set up the proxies for a number of programs, including, apt
, git
, and docker
. To
make this process simpler, an example user-data.proxy
file
is included in the ubuntu16.04
subdirectory.
The default checkmetrics.toml
file provided in each distro subdirectory with the VM
scripts will probably not match the results produced on your particular system. You will
need to tune the checkmetrics.toml
file by doing a few test runs and determining the
values you require.
- Get Your Jenkins master/slave connection up and running and build jobs.
- Perform a few runs (>= 3). Expect them to fail the
checkmetrics
check. - Analyse the logs or the JSON files from those logs to determine the values your system produces.
- Edit the
checkmetrics.toml
file in the distro subdirectory on the agent machine. - Disable the slave in your Jenkins Master, and wait for all builds to finish.
One trick is to change any relevant slave label from say
metrics
tometricsX
to stop new jobs being scheduled on that slave. - Re-run scripts
2
and3
to rebuild the master VM image. - Re-enable the slave.
- Iterate from step (2) until you are satisfied with the results.