diff --git a/xml/ha_config_basics.xml b/xml/ha_config_basics.xml
index 6a6a16a7f..9e3792ed4 100644
--- a/xml/ha_config_basics.xml
+++ b/xml/ha_config_basics.xml
@@ -36,25 +36,256 @@
-
- Global Cluster Options
+
+ Use Case Scenarios
+ In general, clusters fall into one of two categories:
+
+
+ Two-node clusters
+
+
+ clusters with more than two nodes. This means usually an odd number of nodes.
+
+
+
+ Adding also different topologies, different use cases can be derived.
+ The following use cases are the most common:
+
+
+
+
+
+ Two-node cluster in one location
+
+
+ Configuration:
+ FC SAN or similar shared storage, layer 2 network.
+
+
+ Usage scenario:
+ Embedded clusters that focus on service high
+ availability and not data redundancy for data replication.
+ Such a setup is used for radio stations or assembly line controllers,
+ for example.
+
+
+
+
+
+ Two-node clusters in two locations (most widely used)
+
+
+ Configuration:
+ Symmetrical stretched cluster, FC SAN, and layer 2 network
+ all across two locations.
+
+
+ Usage scenario:
+ Classical stretched clusters, focus on service high availability
+ and local data redundancy. For databases and enterprise
+ resource planning. One of the most popular setup during the last
+ years.
+
+
+
+
+
+ Odd number of nodes in three locations
+
+
+ Configuration:
+ 2×N+1 nodes, FC SAN across two main locations. Auxiliary
+ third site with no FC SAN, but acts as a majority maker.
+ Layer 2 network at least across two main locations.
+
+
+
+ Usage scenario:
+ Classical stretched cluster, focus on service high availability
+ and data redudancy. For example, databases, enterprise resource planning.
+
+
+
+
+
+
+
+
+
+
+
+ Quorum Determination
- Global cluster options control how the cluster behaves when confronted
- with certain situations. They are grouped into sets and can be viewed and
- modified with the cluster management tools like &hawk2; and the
- crm shell.
+ Whenever communication fails between one or more nodes and the rest of the
+ cluster, a cluster partition occurs. The nodes can only communicate with
+ other nodes in the same partition and are unaware of the separated nodes.
+ A cluster partition is defined to have quorum (is quorate)
+ if it has the majority of nodes (or votes).
+ How this is achieved is done by quorum calculation.
+ Quorum is a requirement for fencing.
+
+
+ Quorum calculation has changed between &productname; 11 and
+ &productname; 15. For &productname; 11, quorum was calculated by
+ &pace;.
+ Starting with &productname; 12, &corosync; can handle quorum for
+ two-node clusters directly without changing the &pace; configuration.
-
- Overview
-
- For an overview of all global cluster options and their default values,
- see &paceex;, available from . Refer to section
- Available Cluster Options.
-
-
- The predefined values can usually be kept. However, to make key
+ How quorum is calculated is influenced by the following factors:
+
+
+ Number of Cluster Nodes
+
+ To keep services running, a cluster with more than two nodes
+ relies on quorum (majority vote) to resolve cluster partitions.
+ Based on the following formula, you can calculate the minimum
+ number of operational nodes required for the cluster to function:
+
+ For example, a five-node cluster needs a minimum of three operational
+ nodes (or two nodes which can fail).
+
+ We strongly recommend to use either a two-node cluster or an odd number
+ of cluster nodes.
+ Two-node clusters make sense for stretched setups across two sites.
+ Clusters with an odd number of nodes can be built on either one single
+ site or might be spread across three sites.
+
+
+
+
+ Corosync Configuration
+
+ &corosync; is a messaging and membership layer, see
+ and
+ .
+
+
+
+
+
+
+ Global Cluster Options
+ Global cluster options control how the cluster behaves when
+ confronted with certain situations. They are grouped into sets and can be
+ viewed and modified with the cluster management tools like &hawk2; and
+ the crm shell.
+ The predefined values can usually be kept. However, to make key
functions of your cluster work correctly, you need to adjust the
following parameters after basic cluster setup:
@@ -70,27 +301,10 @@
-
- Learn how to adjust those parameters with the cluster management tools
- of your choice:
-
-
-
-
- &hawk2;:
-
-
-
-
- Option no-quorum-policy
+ Global Option no-quorum-policy
This global option defines what to do when a cluster partition does not
have quorum (no majority of nodes is part of the partition).
@@ -103,31 +317,23 @@
ignore
+
- The quorum state does not influence the cluster behavior; resource
- management is continued.
+ Setting no-quorum-policy to ignore makes
+ the cluster behave like it has quorum. Resource management is
+ continued.
- This setting is useful for the following scenarios:
+ This was the default for &slsa; 11 for a two-node cluster.
+ Starting with &slsa; 12, this option is obsolete.
+ Based on configuration and conditions, &corosync; gives cluster nodes
+ or a single node quorum—or not.
+
+
+ For two node clusters the only meaningful behaviour is to always
+ react in case of quorum loss. The first step always should be
+ trying to fence the lost node.
-
-
-
- Resource-driven clusters: For local clusters with redundant
- communication channels, a split brain scenario only has a certain
- probability. Thus, a loss of communication with a node most likely
- indicates that the node has crashed. The surviving nodes
- should recover and start serving the resources again.
-
-
- If no-quorum-policy is set to
- ignore, a 4-node cluster can sustain concurrent
- failure of three nodes before service is lost. With the other
- settings, it would lose quorum after concurrent failure of two
- nodes. For a two-node cluster this option and value is never set.
-
-
-
@@ -167,7 +373,8 @@
If quorum is lost, all nodes in the affected cluster partition are
- fenced.
+ fenced. This option works only in combination with SBD, see
+ .
@@ -175,7 +382,7 @@
- Option stonith-enabled
+ Global Option stonith-enabled
This global option defines whether to apply fencing, allowing &stonith;
devices to shoot failed nodes and nodes with resources that cannot be
@@ -191,7 +398,7 @@
aware that this has impact on the support status for your product.
Furthermore, with stonith-enabled="false", resources
like the Distributed Lock Manager (DLM) and all services depending on
- DLM (such as LVM2, GFS2, and OCFS2) will fail to start.
+ DLM (such as cLVM, GFS2, and OCFS2) will fail to start.
No Support Without &stonith;
@@ -200,7 +407,91 @@
+
+
+ &corosync; Configuration for Two-Node Clusters
+
+ When using the bootstrap scripts, the &corosync; configuration contains
+ a quorum section with the following options:
+
+
+ Excerpt of &corosync; Configuration for a Two-Node Cluster
+ quorum {
+ # Enable and configure quorum subsystem (default: off)
+ # see also corosync.conf.5 and votequorum.5
+ provider: corosync_votequorum
+ expected_votes: 2
+ two_node: 1
+}
+
+
+ As opposed to &sle; 11, the votequorum subsystem in &sle; 12 is
+ powered by &corosync; version 2.x. This means that the
+ no-quorum-policy=ignore option must not be used.
+
+
+ By default, when two_node: 1 is set, the
+ wait_for_all option is automatically enabled.
+ If wait_for_all is not enbaled, the cluster should be
+ started on both nodes in parallel. Otherwise the first node will perform
+ a startup-fencing on the missing second node.
+
+
+
+ &corosync; Configuration for N-Node Clusters
+ When not using a two-node cluster, we strongly recommend an odd
+ number of nodes for your N-node cluster. With regards to quorum
+ configuration, you have the following options:
+
+
+ Adding additional nodes with the ha-cluster-join
+ command, or
+
+
+ Adapting the &corosync; configuration manually.
+
+
+
+ If you adjust /etc/corosync/corosync.conf manually,
+ use the following settings:
+
+
+ Excerpt of &corosync; Configuration for a N-Node Cluster
+ quorum {
+ provider: corosync_votequorum
+ expected_votes: N
+ wait_for_all: 1
+}
+
+
+ Use the quorum service from &corosync;
+
+
+ The number of votes to expect. This parameter can either be
+ provided inside the quorum section, or is
+ automatically calculated when the nodelist
+ section is available.
+
+
+
+ Enables the wait for all (WFA) feature.
+ When WFA is enabled, the cluster will be quorate for the first time
+ only after all nodes have been visible.
+ To avoid some startup race conditions, setting
+ to 1 may help.
+ For example, in a five-node cluster every node has one vote and thus,
+ is set to 5.
+ As soon as three or more nodes are visible to each other, the cluster
+ partition becomes quorate and can start operating.
+
+
+
+
+
+
Cluster Resources
diff --git a/xml/ha_requirements.xml b/xml/ha_requirements.xml
index 36d45951a..3477b40fc 100644
--- a/xml/ha_requirements.xml
+++ b/xml/ha_requirements.xml
@@ -228,18 +228,10 @@
Number of Cluster Nodes
-
- Odd Number of Cluster Nodes
-
- For clusters with more than three nodes, it is strongly recommended to use
- an odd number of cluster nodes.
-
- To keep services running, a cluster with more than two nodes
- relies on quorum (majority vote) to resolve cluster partitions.
- A two- or three-node cluster can tolerate the failure of one node at a time,
- a five-node cluster can tolerate failures of two nodes, etc.
-
-
+ For clusters with more than two nodes, it is strongly recommended to use
+ an odd number of cluster nodes to have quorum. For more information
+ about quorum, see .
+