diff --git a/xml/book_administration.xml b/xml/book_administration.xml index 65c22ac4..a4718845 100644 --- a/xml/book_administration.xml +++ b/xml/book_administration.xml @@ -66,6 +66,7 @@ + diff --git a/xml/ha_virtualization.xml b/xml/ha_virtualization.xml new file mode 100644 index 00000000..30782501 --- /dev/null +++ b/xml/ha_virtualization.xml @@ -0,0 +1,512 @@ + + + + %entities; +]> + + + &ha; for virtualization + + + + This chapter explains how to configure virtual machines as highly available cluster resources. + + + + + yes + + + + + Overview + + Virtual machines can take different roles in a &ha; cluster: + + + + + A virtual machine can be managed by the cluster as a resource, without the cluster + managing the services that run on the virtual machine. In this case, the VM is opaque + to the cluster. This is the scenario described in this document. + + + + + A virtual machine can be a cluster resource and run &pmremote;, + which allows the cluster to manage services running on the virtual machine. + In this case, the VM is a guest node and is transparent to the cluster. + For this scenario, see . + + + + + A virtual machine can run a full cluster stack. In this case, the VM is a regular + cluster node and is not managed by the cluster as a resource. For this scenario, + see . + + + + + The following procedures describe how to set up highly available virtual machines on + block storage, with another block device used as an &ocfs; volume to store the VM lock files + and XML configuration files. The virtual machines and the &ocfs; volume are configured as + resources managed by the cluster, with resource constraints to ensure that the + lock file directory is always available before a virtual machine starts on any node. + This prevents the virtual machines from starting on multiple nodes. + + + + + Requirements + + + + A running &ha; cluster with at least two nodes and a fencing device such as SBD. + + + + + Passwordless &rootuser; SSH login between the cluster nodes. + + + + + A network bridge on each cluster node, to be used for installing and running the VMs. + This must be separate from the network used for cluster communication and management. + + + + + Two or more shared storage devices (or partitions on a single shared device), + so that all cluster nodes can access the files and storage required by the VMs: + + + + + A device to use as an &ocfs; volume, which will store the VM lock files and XML configuration files. + Creating and mounting the &ocfs; volume is explained in the following procedure. + + + + + A device containing the VM installation source (such as an ISO file or disk image). + + + + + Depending on the installation source, you might also need another device for the + VM storage disks. + + + + + To avoid I/O starvation, these devices must be separate from the shared device used for SBD. + + + + + Stable device names for all storage paths, for example, + /dev/disk/by-id/DEVICE_ID. + A shared storage device might have mismatched /dev/sdX names on + different nodes, which will cause VM migration to fail. + + + + + + + Configuring cluster resources to manage the lock files + + Use this procedure to configure the cluster to manage the virtual machine lock files. + The lock file directory must be available on all nodes so that the cluster is aware of the + lock files no matter which node the VMs are running on. + + + You only need to run the following commands on one of the cluster nodes. + + + Configuring cluster resources to manage the lock files + + + Create an &ocfs; volume on one of the shared storage devices: + +&prompt.root;mkfs.ocfs2 /dev/disk/by-id/DEVICE_ID + + + + Run crm configure to start the crm interactive shell. + + + + + Create a primitive resource for DLM: + +&prompt.crm.conf;primitive dlm ocf:pacemaker:controld \ + op monitor interval=60 timeout=60 + + + + Create a primitive resource for the &ocfs; volume: + +&prompt.crm.conf;primitive ocfs2 Filesystem \ + params device="/dev/disk/by-id/DEVICE_ID" directory="/mnt/shared" fstype=ocfs2 \ + op monitor interval=20 timeout=40 + + + + Create a group for the DLM and &ocfs; resources: + +&prompt.crm.conf;group g-virt-lock dlm ocfs2 + + + + Clone the group so that it runs on all nodes: + +&prompt.crm.conf;clone cl-virt-lock g-virt-lock \ + meta interleave=true + + + + Review your changes with show. + + + + + If everything is correct, submit your changes with commit + and leave the crm live configuration with quit. + + + + + Check the status of the group clone. It should be running on all nodes: + +&prompt.root;crm status +[...] +Full List of Resources: +[...] + * Clone Set: cl-virt-lock [g-virt-lock]: + * Started: [ &node1; &node2; ] + + + + + + + + + Preparing the cluster nodes to host virtual machines + + Use this procedure to install and start the required virtualization services, and to + configure the nodes to store the VM lock files on the shared &ocfs; volume. + + + This procedure uses crm cluster run to run commands on all + nodes at once. If you prefer to manage each node individually, you can omit the + crm cluster run portion of the commands. + + + Preparing the cluster nodes to host virtual machines + + + Install the virtualization packages on all nodes in the cluster: + +&prompt.root;crm cluster run "zypper install -y -t pattern kvm_server kvm_tools" + + + + On one node, find and enable the lock_manager setting in the file + /etc/libvirt/qemu.conf: + +lock_manager = "lockd" + + + + On the same node, find and enable the file_lockspace_dir setting in the + file /etc/libvirt/qemu-lockd.conf, and change the value to point to + a directory on the &ocfs; volume: + +file_lockspace_dir = "/mnt/shared/lockd" + + + + Copy these files to the other nodes in the cluster: + +&prompt.root;crm cluster copy /etc/libvirt/qemu.conf +&prompt.root;crm cluster copy /etc/libvirt/qemu-lockd.conf + + + + Enable and start the libvirtd service on all nodes in the cluster: + +&prompt.root;crm cluster run "systemctl enable --now libvirtd" + + This also starts the virtlockd service. + + + + + + + Adding virtual machines as cluster resources + + Use this procedure to add virtual machines to the cluster as cluster resources, with + resource constraints to ensure the VMs can always access the lock files. The lock files are + managed by the resources in the group g-virt-lock, which is available on + all nodes via the clone cl-virt-lock. + + + Adding virtual machines as cluster resources + + + Install your virtual machines on one of the cluster nodes, with the following restrictions: + + + + + The installation source and storage must be on shared devices. + + + + + Do not configure the VMs to start on host boot. + + + + + For more information, see + + &virtual; for &sles;. + + + + + If the virtual machines are running, shut them down. The cluster will start the VMs + after you add them as resources. + + + + + Dump the XML configuration to the &ocfs; volume. Repeat this step for each VM: + +&prompt.root;virsh dumpxml VM1 > /mnt/shared/VM1.xml + + + Make sure the XML files do not contain any references to unshared local paths. + + + + + + Run crm configure to start the crm interactive shell. + + + + + Create primitive resources to manage the virtual machines. Repeat this step for each VM: + +&prompt.crm.conf;primitive VM1 VirtualDomain \ + params config="/mnt/shared/VM1.xml" remoteuri="qemu+ssh://%n/system" \ + meta allow-migrate=true \ + op monitor timeout=30s interval=10s + + The option allow-migrate=true enables live migration. If the value is + set to false, the cluster migrates the VM by shutting it down on + one node and restarting it on another node. + + + If you need to set utilization attributes to help place VMs based on their load impact, + see . + + + + + Create a colocation constraint so that the virtual machines can only start on nodes where + cl-virt-lock is running: + +&prompt.crm.conf;colocation col-fs-virt inf: ( VM1 VM2 VMX ) cl-virt-lock + + + + Create an ordering constraint so that cl-virt-lock always starts before + the virtual machines: + +&prompt.crm.conf;order o-fs-virt Mandatory: cl-virt-lock ( VM1 VM2 VMX ) + + + + Review your changes with show. + + + + + If everything is correct, submit your changes with commit + and leave the crm live configuration with quit. + + + + + Check the status of the virtual machines: + +&prompt.root;crm status +[...] +Full List of Resources: +[...] + * Clone Set: cl-virt-lock [g-virt-lock]: + * Started: [ &node1; &node2; ] + * VM1 (ocf::heartbeat:VirtualDomain): Started &node1; + * VM2 (ocf::heartbeat:VirtualDomain): Started &node1; + * VMX (ocf::heartbeat:VirtualDomain): Started &node1; + + + + The virtual machines are now managed by the &ha; cluster, and can migrate between the cluster nodes. + + + Do not manually start or stop cluster-managed VMs + + After adding virtual machines as cluster resources, do not manage them manually. + Only use the cluster tools as described in . + + + To perform maintenance tasks on cluster-managed VMs, see . + + + + + + Testing the setup + + Use the following tests to confirm that the virtual machine &ha; setup works as expected. + + + + Perform these tests in a test environment, not a production environment. + + + + Verifying that the VM resource is protected across cluster nodes + + + The virtual machine VM1 is running on node &node1;. + + + + + On node &node2;, try to start the VM manually with + virsh start VM1. + + + + + Expected result: The virsh command + fails. VM1 cannot be started manually on &node2; + when it is running on &node1;. + + + + + Verifying that the VM resource can live migrate between cluster nodes + + + The virtual machine VM1 is running on node &node1;. + + + + + Open two terminals. + + + + + In the first terminal, connect to VM1 via SSH. + + + + + In the second terminal, try to migrate VM1 to node + &node2; with crm resource move VM1 bob. + + + + + Run crm_mon -r to monitor the cluster status until it + stabilizes. This might take a short time. + + + + + In the first terminal, check whether the SSH connection to VM1 + is still active. + + + + + Expected result: The cluster status shows that + VM1 has started on &node2;. The SSH connection + to VM1 remains active during the whole migration. + + + + + Verifying that the VM resource can migrate to another node when the current node reboots + + + The virtual machine VM1 is running on node &node2;. + + + + + Reboot &node2;. + + + + + On node &node1;, run crm_mon -r to + monitor the cluster status until it stabilizes. This might take a short time. + + + + + Expected result: The cluster status shows that + VM1 has started on &node1;. + + + + + Verifying that the VM resource can fail over to another node when the current node crashes + + + The virtual machine VM1 is running on node &node1;. + + + + + Simulate a crash on &node1; by forcing the machine off or + unplugging the power cable. + + + + + On node &node2;, run crm_mon -r to + monitor the cluster status until it stabilizes. VM failover after a node crashes + usually takes longer than VM migration after a node reboots. + + + + + Expected result: After a short time, the cluster status + shows that VM1 has started on &node2;. + + + + + +