From 6dae4b3ea13c619d6a2b8ea1f53e5a7440100339 Mon Sep 17 00:00:00 2001 From: Tanja Roth Date: Tue, 27 Nov 2018 17:29:43 +0100 Subject: [PATCH 1/2] Admin Guide proofing corrections part1 --- xml/ha_concepts.xml | 7 +++---- xml/ha_config_basics.xml | 22 +++++++++++----------- xml/ha_fencing.xml | 10 +++++----- xml/ha_glossary.xml | 2 +- xml/ha_maintenance.xml | 28 ++++++++++++++-------------- xml/ha_requirements.xml | 5 ++--- xml/ha_storage_protection.xml | 28 ++++++++++++++-------------- 7 files changed, 50 insertions(+), 52 deletions(-) diff --git a/xml/ha_concepts.xml b/xml/ha_concepts.xml index 488d35c32..6dbe2fbd9 100644 --- a/xml/ha_concepts.xml +++ b/xml/ha_concepts.xml @@ -12,7 +12,7 @@ &productnamereg; is an integrated suite of open source clustering technologies that enables you to implement highly available physical and - virtual Linux clusters, and to eliminate single point of failure. It + virtual Linux clusters, and to eliminate single points of failure. It ensures the high availability and manageability of critical resources including data, applications, and services. Thus, it helps you maintain business continuity, protect data integrity, and reduce @@ -150,7 +150,6 @@ &productname; supports the clustering of both physical and virtual Linux servers. Mixing both types of servers is supported as well. - &sls; &productnumber; ships with &xen;, &sls; &productnumber; ships with Xen and KVM (Kernel-based Virtual Machine). Both are open source virtualization hypervisors. Virtualization guest systems (also known as VMs) can be managed as services by the cluster. @@ -185,7 +184,7 @@ centers. The cluster usually uses unicast for communication between the nodes and manages failover internally. Network latency is usually low (<5 ms for distances of approximately 20 miles). Storage - preferably is connected by fibre channel. Data replication is done by + is preferably connected by fibre channel. Data replication is done by storage internally, or by host based mirror under control of the cluster. @@ -749,7 +748,7 @@ data or complete resource recovery. For this Pacemaker comes with a fencing subsystem, stonithd. &stonith; is an acronym for Shoot The Other Node In The Head. - It usually is implemented with a &stonith; shared block device, remote + It is usually implemented with a &stonith; shared block device, remote management boards, or remote power switches. In &pace;, &stonith; devices are modeled as resources (and configured in the CIB) to enable them to be easily used. diff --git a/xml/ha_config_basics.xml b/xml/ha_config_basics.xml index 93295dc51..f8485928f 100644 --- a/xml/ha_config_basics.xml +++ b/xml/ha_config_basics.xml @@ -45,7 +45,7 @@ Two-node clusters - clusters with more than two nodes. This means usually an odd number of nodes. + Clusters with more than two nodes. This usually means an odd number of nodes. @@ -82,9 +82,9 @@ Usage scenario: - Classical stretched clusters, focus on service high availability + Classic stretched clusters, focus on high availability of services and local data redundancy. For databases and enterprise - resource planning. One of the most popular setup during the last + resource planning. One of the most popular setups during the last few years. @@ -102,7 +102,7 @@ Usage scenario: - Classical stretched cluster, focus on service high availability + Classic stretched cluster, focus on high availability of services and data redundancy. For example, databases, enterprise resource planning. @@ -224,7 +224,7 @@ Whenever communication fails between one or more nodes and the rest of the cluster, a cluster partition occurs. The nodes can only communicate with other nodes in the same partition and are unaware of the separated nodes. - A cluster partition is defined to have quorum (is quorate) + A cluster partition is defined as having quorum (can quorate) if it has the majority of nodes (or votes). How this is achieved is done by quorum calculation. Quorum is a requirement for fencing. @@ -256,7 +256,7 @@ C = number of cluster nodes of cluster nodes. Two-node clusters make sense for stretched setups across two sites. Clusters with an odd number of nodes can be built on either one single - site or might be spread across three sites. + site or might being spread across three sites. @@ -322,9 +322,9 @@ C = number of cluster nodes or a single node quorum—or not. - For two node clusters the only meaningful behaviour is to always - react in case of quorum loss. The first step always should be - trying to fence the lost node. + For two-node clusters the only meaningful behavior is to always + react in case of quorum loss. The first step should always be + to try to fence the lost node. @@ -450,7 +450,7 @@ C = number of cluster nodes use the following settings: - Excerpt of &corosync; Configuration for a N-Node Cluster + Excerpt of &corosync; Configuration for an N-Node Cluster quorum { provider: corosync_votequorum expected_votes: N @@ -470,7 +470,7 @@ C = number of cluster nodes Enables the wait for all (WFA) feature. When WFA is enabled, the cluster will be quorate for the first time - only after all nodes have been visible. + only after all nodes have become visible. To avoid some start-up race conditions, setting to 1 may help. For example, in a five-node cluster every node has one vote and thus, diff --git a/xml/ha_fencing.xml b/xml/ha_fencing.xml index 3cb43f449..ca3b6ea7c 100644 --- a/xml/ha_fencing.xml +++ b/xml/ha_fencing.xml @@ -149,10 +149,10 @@ increasingly popular and may even become standard in off-the-shelf computers. However, if they share a power supply with their host (a cluster node), they might not work when needed. If a node stays without - power, the device supposed to control it would be useless. Therefor, it - is highly recommended using battery backed Lights-out devices. - Another aspect is this devices are accessed by network. This might - imply single point of failure, or security concerns. + power, the device supposed to control it would be useless. Therefore, it + is highly recommended to use battery backed lights-out devices. + Another aspect is that these devices are accessed by network. This might + imply a single point of failure, or security concerns. @@ -434,7 +434,7 @@ hostlist The Kdump plug-in must be used in concert with another, real &stonith; device, for example, external/ipmi. The order of the fencing devices must be specified by crm configure - fencing_topology. To achieve that Kdump is checked before + fencing_topology. For Kdump to be checked before triggering a real fencing mechanism (like external/ipmi), use a configuration similar to the following: fencing_topology \ diff --git a/xml/ha_glossary.xml b/xml/ha_glossary.xml index c16fefc83..7f43ccaa7 100644 --- a/xml/ha_glossary.xml +++ b/xml/ha_glossary.xml @@ -453,7 +453,7 @@ performance will be met during a contractual measurement period. quorum - In a cluster, a cluster partition is defined to have quorum (is + In a cluster, a cluster partition is defined to have quorum (can quorate) if it has the majority of nodes (or votes). Quorum distinguishes exactly one partition. It is part of the algorithm to prevent several disconnected partitions or nodes from proceeding and diff --git a/xml/ha_maintenance.xml b/xml/ha_maintenance.xml index 41e7473cc..b9dbadf91 100644 --- a/xml/ha_maintenance.xml +++ b/xml/ha_maintenance.xml @@ -147,7 +147,7 @@ Node &node2;: standby - + @@ -158,7 +158,7 @@ Node &node2;: standby - + @@ -186,7 +186,7 @@ Node &node2;: standby - + @@ -201,7 +201,7 @@ Node &node2;: standby - + @@ -266,7 +266,7 @@ Node &node2;: standby - Putting the Cluster Into Maintenance Mode + Putting the Cluster into Maintenance Mode To put the cluster into maintenance mode on the &crmshell;, use the following command: &prompt.root;crm configure property maintenance-mode=true @@ -275,7 +275,7 @@ Node &node2;: standby &prompt.root;crm configure property maintenance-mode=false - Putting the Cluster Into Maintenance Mode with &hawk2; + Putting the Cluster into Maintenance Mode with &hawk2; Start a Web browser and log in to the cluster as described in @@ -315,7 +315,7 @@ Node &node2;: standby - Putting a Node Into Maintenance Mode + Putting a Node into Maintenance Mode To put a node into maintenance mode on the &crmshell;, use the following command: &prompt.root;crm node maintenance NODENAME @@ -324,7 +324,7 @@ Node &node2;: standby &prompt.root;crm node ready NODENAME - Putting a Node Into Maintenance Mode with &hawk2; + Putting a Node into Maintenance Mode with &hawk2; Start a Web browser and log in to the cluster as described in @@ -352,7 +352,7 @@ Node &node2;: standby - Putting a Node Into Standby Mode + Putting a Node into Standby Mode To put a node into standby mode on the &crmshell;, use the following command: &prompt.root;crm node standby NODENAME @@ -361,7 +361,7 @@ Node &node2;: standby &prompt.root;crm node online NODENAME - Putting a Node Into Standby Mode with &hawk2; + Putting a Node into Standby Mode with &hawk2; Start a Web browser and log in to the cluster as described in @@ -394,7 +394,7 @@ Node &node2;: standby - Putting a Resource Into Maintenance Mode + Putting a Resource into Maintenance Mode To put a resource into maintenance mode on the &crmshell;, use the following command: &prompt.root;crm resource maintenance RESOURCE_ID true @@ -403,7 +403,7 @@ Node &node2;: standby &prompt.root;crm resource maintenance RESOURCE_ID false - Putting a Resource Into Maintenance Mode with &hawk2; + Putting a Resource into Maintenance Mode with &hawk2; Start a Web browser and log in to the cluster as described in @@ -459,7 +459,7 @@ Node &node2;: standby - Putting a Resource Into Unmanaged Mode + Putting a Resource into Unmanaged Mode To put a resource into unmanaged mode on the &crmshell;, use the following command: &prompt.root;crm resource unmanage RESOURCE_ID @@ -468,7 +468,7 @@ Node &node2;: standby &prompt.root;crm resource manage RESOURCE_ID - Putting a Resource Into Unmanaged Mode with &hawk2; + Putting a Resource into Unmanaged Mode with &hawk2; Start a Web browser and log in to the cluster as described in diff --git a/xml/ha_requirements.xml b/xml/ha_requirements.xml index f7e6f8cc7..4e90c233c 100644 --- a/xml/ha_requirements.xml +++ b/xml/ha_requirements.xml @@ -137,9 +137,8 @@ When using DRBD* to implement a mirroring RAID system that distributes data across two machines, make sure to only access the device provided - by DRBD—never the backing device. Use bonded NICs. Same NICs as - the rest of the cluster uses are possible to leverage the redundancy - provided there. + by DRBD—never the backing device. Use bonded NICs. To leverage the + redundancy it is possible to use the same NICs as the rest of the cluster. diff --git a/xml/ha_storage_protection.xml b/xml/ha_storage_protection.xml index 364bcafc5..6b3539ced 100644 --- a/xml/ha_storage_protection.xml +++ b/xml/ha_storage_protection.xml @@ -82,7 +82,7 @@ In an environment where all nodes have access to shared storage, a small partition of the device is formatted for use with SBD. The size of the partition depends on the block size of the used disk (for example, - 1 MB for standard SCSI disks with 512 Byte block size or + 1 MB for standard SCSI disks with 512 byte block size or 4 MB for DASD disks with 4 kB block size). The initialization process creates a message layout on the device with slots for up to 255 nodes. @@ -129,7 +129,7 @@ feeds the watchdog by regularly writing a service pulse to the watchdog. If the daemon stops feeding the watchdog, the hardware will enforce a system restart. This protects against failures of the SBD - process itself, such as dying, or becoming stuck on an IO error. + process itself, such as dying, or becoming stuck on an I/O error. @@ -202,8 +202,8 @@ The shared storage can be connected via Fibre Channel (FC), Fibre Channel over Ethernet (FCoE), or even iSCSI. In virtualized environments, the hypervisor might provide shared block devices. In any - case, content on that shared block device need to be consistent for - all cluster nodes. It need to be made sure, caching does not brake that + case, content on that shared block device needs to be consistent for + all cluster nodes. Make sure that caching does not break that consistency. @@ -597,7 +597,7 @@ stonith-timeout = Timeout (msgwait) + 1.2 &prompt.root;sbd -d /dev/SBD create (Replace /dev/SBD with your actual path name, for example: - /dev/disk/by-id/scsi-ST2000DM001-0123456_Wabcdefg). + /dev/disk/by-id/scsi-ST2000DM001-0123456_Wabcdefg.) To use more than one device for SBD, specify the option multiple times, for example: &prompt.root;sbd -d /dev/SBD1 -d /dev/SBD2 -d /dev/SBD3 create @@ -804,10 +804,10 @@ Timeout (msgwait) : 180 and different delay values are being used. The targeted node will lose in a fencing race. The parameter can be used to mark a specific node to survive - in case of a split-brain scenario in a two-node cluster. + in case of a split brain scenario in a two-node cluster. To make this succeed, it is essential to create two primitive &stonith; devices for each node. In the following configuration, &node1; will win - and survive in case of a split-brain scenario: + and survive in case of a split brain scenario: &prompt.crm.conf;primitive st-sbd-&node1; stonith:external/sbd params \ pcmk_host_list=&node1; pcmk_delay_base=20 @@ -1019,12 +1019,12 @@ SBD_WATCHDOG_TIMEOUT=5 Additional Mechanisms for Storage Protection toms 2018-09-27: (lpinne): the whole SBD chapter should go to the STONITH chapter. - SBD is the recommened method for general node fencing. It is not - particulraly related to storage protection. In opposite, sfex and + SBD is the recommended method for general node fencing. It is not + particularly related to storage protection. In opposite, sfex and sg_persist are purely storage protection. Apart from node fencing via &stonith; there are other methods to achieve storage protection at a resource level. For example, SCSI-3 and SCSI-4 use - persistent reservations whereas sfex provides a locking + persistent reservations, whereas sfex provides a locking mechanism. Both methods are explained in the following subsections. @@ -1032,7 +1032,7 @@ SBD_WATCHDOG_TIMEOUT=5 toms 2018-04-20: I would like to see a little bit more background information. - The SCSI specification 3 and 4 define persistent reservations. + The SCSI specifications 3 and 4 define persistent reservations. These are SCSI protocol features and can be used for I/O fencing and failover. This feature is implemented in the sg_persist Linux command. @@ -1087,7 +1087,7 @@ Illegal request, Invalid opcode meta master-max=1 notify=true - Do some tests. When the resource is in master/slave status, on the + Do some tests. When the resource is in master/slave status on the master server, you can mount and write on /dev/sdc1, while on the slave server you cannot write. @@ -1222,12 +1222,12 @@ Illegal request, Invalid opcode To protect resources via an sfex lock, create mandatory ordering and - placement constraints between the resources to protect and the sfex resource. If + placement constraints between the resources to protect the sfex resource. If the resource to be protected has the ID filesystem1: &prompt.crm.conf;order order-sfex-1 inf: sfex_1 filesystem1 -&prompt.crm.conf;colocation colo-sfex-1 inf: filesystem1 sfex_1 +&prompt.crm.conf;colocation col-sfex-1 inf: filesystem1 sfex_1 From 3270406c3cfb9d9aa705bdafe5939202cca0db0e Mon Sep 17 00:00:00 2001 From: Tanja Roth Date: Wed, 28 Nov 2018 13:24:59 +0100 Subject: [PATCH 2/2] Admin Guide proofing corrections part 2 - all proofing corrections are in now, the other 5 guides/articles did not see any proofing changes --- xml/ha_clvm.xml | 4 ++-- xml/ha_docupdates.xml | 6 +++--- xml/ha_maintenance.xml | 10 +++++----- xml/ha_rear.xml | 2 +- xml/ha_storage_basics.xml | 10 +++++----- 5 files changed, 16 insertions(+), 16 deletions(-) diff --git a/xml/ha_clvm.xml b/xml/ha_clvm.xml index 71d1ce8b4..d3a77f1df 100644 --- a/xml/ha_clvm.xml +++ b/xml/ha_clvm.xml @@ -173,7 +173,7 @@ cLVM) for more information and details to integrate here - really helpful--> Check if the lvmetad daemon is - disabled because it cannot work with cLVM. In /etc/lvm/lvm.conf, + disabled, because it cannot work with cLVM. In /etc/lvm/lvm.conf, the keyword use_lvmetad must be set to 0 (the default is 1). Copy the configuration to all nodes, if necessary. @@ -516,7 +516,7 @@ cLVM) for more information and details to integrate here - really helpful--> - Scenario: cLVM With iSCSI on SANs + Scenario: cLVM with iSCSI on SANs The following scenario uses two SAN boxes which export their iSCSI targets to several clients. The general idea is displayed in diff --git a/xml/ha_docupdates.xml b/xml/ha_docupdates.xml index eadc6dbae..efd266954 100644 --- a/xml/ha_docupdates.xml +++ b/xml/ha_docupdates.xml @@ -146,7 +146,7 @@ toms 2014-08-12: (, , , - . + ). @@ -203,7 +203,7 @@ toms 2014-08-12: - Added chapter . Moved respective + Added . Moved respective sections from , , and there. The new chapter gives an overview of different options the cluster stack @@ -464,7 +464,7 @@ toms 2014-08-12: In , mentioned that each - disk need to be accessible by Cluster MD on each node (). diff --git a/xml/ha_maintenance.xml b/xml/ha_maintenance.xml index b9dbadf91..068c9b343 100644 --- a/xml/ha_maintenance.xml +++ b/xml/ha_maintenance.xml @@ -20,7 +20,7 @@ This chapter explains how to manually take down a cluster node without - negative side-effects. It also gives an overview of different options the + negative side effects. It also gives an overview of different options the cluster stack provides for executing maintenance tasks. @@ -175,7 +175,7 @@ Node &node2;: standby A node that is in standby mode can no longer run resources. Any resources running on the node will be moved away or stopped (in case no other node - is eligible to run the resource). Also, all monitor operations will be + is eligible to run the resource). Also, all monitoring operations will be stopped on the node (except for those with role="Stopped"). @@ -190,7 +190,7 @@ Node &node2;: standby - When this mode is enabled for a resource, no monitor operations will be + When this mode is enabled for a resource, no monitoring operations will be triggered for the resource. @@ -551,7 +551,7 @@ Node &node2;: standby Check if you have resources of the type ocf:pacemaker:controld - or any dependencies on this type of resources. Resources of the type + or any dependencies on this type of resource. Resources of the type ocf:pacemaker:controld are DLM resources. @@ -564,7 +564,7 @@ Node &node2;: standby The reason is that stopping &pace; also stops the &corosync; service, on whose membership and messaging services DLM depends. If &corosync; stops, - the DLM resource will assume a split-brain scenario and trigger a fencing + the DLM resource will assume a split brain scenario and trigger a fencing operation. diff --git a/xml/ha_rear.xml b/xml/ha_rear.xml index 0b345faef..a3bc48ef7 100644 --- a/xml/ha_rear.xml +++ b/xml/ha_rear.xml @@ -66,7 +66,7 @@ Understanding &rear;'s complex functionality is essential for making the tool work as intended. Therefore, read this chapter carefully and - familiarize with &rear; before a disaster strikes. You should also be + familiarize yourself with &rear; before a disaster strikes. You should also be aware of &rear;'s known limitations and test your system in advance. diff --git a/xml/ha_storage_basics.xml b/xml/ha_storage_basics.xml index 7d377b59b..ce13a6cfc 100644 --- a/xml/ha_storage_basics.xml +++ b/xml/ha_storage_basics.xml @@ -24,7 +24,7 @@ Protocols for DLM Communication - To avoid single point of failure, redundant communication paths are important + To avoid single points of failure, redundant communication paths are important for &ha; clusters. This is also true for DLM communication. If network bonding (Link Aggregation Control Protocol, LACP) cannot be used for any reason, we highly recommend to define a redundant communication channel (a second ring) @@ -33,7 +33,7 @@ Depending on the configuration in &corosync.conf;, DLM then decides - if to use the TCP or SCTP protocol for its communication: + whether to use the TCP or SCTP protocol for its communication: @@ -41,7 +41,7 @@ If rrp_mode is set to none (which means redundant ring configuration is disabled), DLM automatically uses TCP. However, without a redundant communication channel, DLM communication - will fail in case the TCP link is down. + will fail if the TCP link is down. @@ -59,7 +59,7 @@ DLM uses the cluster membership services from &pace; which run in user - space. Therefore, DLM needs to be configured as clone resource that is + space. Therefore, DLM needs to be configured as a clone resource that is present on each node in the cluster. @@ -82,7 +82,7 @@ The configuration consists of a base group that includes several primitives and a base clone. Both base group and base clone can be used in various scenarios afterward (for both &ocfs; and cLVM, for example). You only need - to extended the base group with the respective primitives as needed. As the + to extend the base group with the respective primitives as needed. As the base group has internal colocation and ordering, this simplifies the overall setup as you do not need to specify several individual groups, clones and their dependencies.