Improve resource cleanup section (#350)

* Update crm resource list to crm resource status * Add more information about fail counts bsc#1211019 jsc#DOCTEAM-977 * failcount -> fail count * Improve resource cleanup procedure bsc#1211019 jsc#DOCTEAM-977 * Apply suggestions from editorial review Co-authored-by: Daria Vladykina <[email protected]> --------- Co-authored-by: Daria Vladykina <[email protected]>
SUSE · Oct 11, 2023 · 62bf34a · 62bf34a
1 parent d275162
commit 62bf34a
Show file tree

Hide file tree

Showing 4 changed files with 47 additions and 29 deletions.
diff --git a/xml/geo_booth_i.xml b/xml/geo_booth_i.xml
@@ -327,7 +327,7 @@ ticket = "&ticket2;" <xref linkend="co-ha-geo-booth-config-ticket" xrefstyle="se
       run on the current cluster site. That means, it checks if the cluster is
       healthy enough to run the resource (all resource dependencies are
       fulfilled, the cluster partition has quorum, no dirty nodes, etc.). For
-      example, if a service in the dependency-chain has a failcount of
+      example, if a service in the dependency-chain has a fail count of
       <literal>INFINITY</literal> on all available nodes, the service cannot be
       run on that site. In that case, it is of no use to claim the ticket.
      </para>
@@ -734,7 +734,7 @@ ticket = "tkt-sap-prod" <xref linkend="co-ha-geo-booth-config-ticket" xrefstyle=
      cluster site. That means, it checks if the cluster is healthy enough to
      run the resource (all resource dependencies are fulfilled, the cluster
      partition has quorum, no dirty nodes, etc.). For example, if a service in
-     the dependency-chain has a failcount of <literal>INFINITY</literal> on all
+     the dependency chain has a fail count of <literal>INFINITY</literal> on all
      available nodes, the service cannot be run on that site. In that case, it
      is of no use to claim the ticket.
     </para>

diff --git a/xml/ha_management.xml b/xml/ha_management.xml
@@ -8,10 +8,10 @@
 <!--taroth 2011-09-16: in accordance with kdupke, man pages are removed
 from the book, except for a general overview-->
 <appendix xmlns="http://docbook.org/ns/docbook" xmlns:xi="http://www.w3.org/2001/XInclude" xmlns:xlink="http://www.w3.org/1999/xlink" version="5.0" xml:id="app-ha-management">
-<!-- 
+<!--
     The source for Pacemaker manual page can be found at the
     Mercurial respository.
-  
+
   1. Install the package mercurial.
   2. Clone the URL:
      $ hg clone http://hg.clusterlabs.org/pacemaker/doc pacemaker-doc
@@ -155,7 +155,7 @@ from the book, except for a general overview-->
     <para>
      The <command>crm_failcount</command> command queries the number of
      failures per resource on a given node. This tool can also be used to
-     reset the failcount, allowing the resource to again run on nodes where
+     reset the fail count, allowing the resource to again run on nodes where
      it had failed too often. See the <command>crm_failcount</command> man
      page for a detailed introduction to this tool's usage and command
      syntax.
@@ -178,10 +178,10 @@ from the book, except for a general overview-->
    </listitem>
   </varlistentry>
  </variablelist>
-<!-- 
+<!--
     The source for Pacemaker manual page can be found at the
     Mercurial respository.
-  
+
   1. Install the package mercurial.
   2. Clone the URL:
      $ hg clone http://hg.clusterlabs.org/pacemaker/doc pacemaker-doc
@@ -200,4 +200,4 @@ from the book, except for a general overview-->
  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="ha_crmshadow.xml"/>
  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="ha_crmstandby.xml"/>
  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="ha_crmverify.xml"/>-->
-</appendix>
+</appendix>
diff --git a/xml/ha_managing_resources.xml b/xml/ha_managing_resources.xml
@@ -374,17 +374,17 @@ primitive admin_addr IPaddr2 \
  <title>Cleaning up cluster resources</title>
  <para>
    A resource is automatically restarted if it fails, but each failure
-   increases the resource's failcount.
+   increases the resource's fail count.
   </para>
   <para>
    If a <literal>migration-threshold</literal> has been set for the resource,
    the node will no longer run the resource when the number of failures reaches
    the migration threshold.
   </para>
   <para>
-   A resource's failcount can either be reset automatically (by setting a
-   <literal>failure-timeout</literal> option for the resource), or it can be
-   reset manually using either &hawk2; or &crmsh;.
+   By default, fail counts are not automatically reset. You can configure a fail count
+   to be reset automatically by setting a <literal>failure-timeout</literal> option for the
+   resource, or you can manually reset the fail count using either &hawk2; or &crmsh;.
   </para>
  <sect2 xml:id="sec-conf-hawk2-manage-cleanup">
   <title>Cleaning up cluster resources with &hawk2;</title>
@@ -429,17 +429,35 @@ primitive admin_addr IPaddr2 \
      <para>
       Get a list of all your resources:
      </para>
- <screen>&prompt.root;<command>crm resource list</command>
-  ...
- Resource Group: dlm-clvm:1
-         dlm:1  (ocf:pacemaker:controld) Started
-         clvm:1 (ocf:heartbeat:lvmlockd) Started</screen>
+ <screen>&prompt.root;<command>crm resource status</command>
+Full List of Resources
+   * admin-ip      (ocf:heartbeat:IPaddr2):    Started
+   * stonith-sbd   (stonith:external/sbd):     Started
+   * Resource Group: dlm-clvm:
+     * dlm:        (ocf:pacemaker:controld)    Started
+     * clvm:       (ocf:heartbeat:lvmlockd)    Started</screen>
+    </step>
+    <step>
+      <para>
+        Show the fail count of a resource:
+      </para>
+<screen>&prompt.root;<command>crm resource failcount <replaceable>RESOURCE</replaceable> show <replaceable>NODE</replaceable></command></screen>
+      <para>
+        For example, to show the fail count of the resource <literal>dlm</literal> on node
+        <literal>&node1;</literal>:
+      </para>
+<screen>&prompt.root;<command>crm resource failcount dlm show &node1;</command>
+scope=status name=fail-count-dlm value=2</screen>
     </step>
     <step>
      <para>
-      To clean up the resource <literal>dlm</literal>, for example:
+      Clean up the resource:
      </para>
- <screen>&prompt.root;<command>crm resource cleanup dlm</command></screen>
+ <screen>&prompt.root;<command>crm resource cleanup <replaceable>RESOURCE</replaceable></command></screen>
+      <para>
+        This command cleans up the resource on all nodes. If the resource is part of a group,
+        &crmsh; also cleans up the other resources in the group.
+      </para>
     </step>
    </procedure>
   </sect2>

diff --git a/xml/ha_resource_constraints.xml b/xml/ha_resource_constraints.xml
@@ -817,7 +817,7 @@
     A resource is automatically restarted if it fails. If that cannot
     be achieved on the current node, or it fails <literal>N</literal> times
     on the current node, it tries to fail over to another node. Each time
-    the resource fails, its failcount is raised. You can define several
+    the resource fails, its fail count is raised. You can define several
     failures for resources (a <literal>migration-threshold</literal>), after
     which they will migrate to a new node. If you have more than two nodes
     in your cluster, the node a particular resource fails over to is chosen
@@ -838,12 +838,12 @@
      for resource <literal>rsc1</literal> to preferably run on
      <literal>&node1;</literal>. If it fails there,
      <literal>migration-threshold</literal> is checked and compared to the
-     failcount. If failcount &gt;= migration-threshold then the resource is
+     fail count. If <literal>failcount</literal> &gt;= migration-threshold, then the resource is
      migrated to the node with the next best preference.
     </para>
     <para>
      After the threshold has been reached, the node will no longer be
-     allowed to run the failed resource until the resource's failcount is
+     allowed to run the failed resource until the resource's fail count is
      reset. This can be done manually by the cluster administrator or by
      setting a <literal>failure-timeout</literal> option for the resource.
     </para>
@@ -862,7 +862,7 @@
    <itemizedlist>
     <listitem>
      <para>
-      Start failures set the failcount to <literal>INFINITY</literal> and
+      Start failures set the fail count to <literal>INFINITY</literal> and
       thus always cause an immediate migration.
      </para>
     </listitem>
@@ -907,7 +907,7 @@
      </step>
      <step>
       <para>
-       If you want to automatically expire the failcount for a resource, add the
+       If you want to automatically expire the fail count for a resource, add the
        <literal>failure-timeout</literal> meta attribute to the resource as
        described in
        <xref linkend="pro-conf-hawk2-primitive-add" xrefstyle="select:label title nopage"/>,
@@ -934,8 +934,8 @@
      </step>
     </procedure>
     <para>
-     Instead of letting the failcount for a resource expire automatically, you
-     can also clean up failcounts for a resource manually at any time. Refer to
+     Instead of letting the fail count for a resource expire automatically, you
+     can also clean up fail counts for a resource manually at any time. Refer to
      <xref linkend="sec-conf-hawk2-manage-cleanup"/> for details.
     </para>
    </sect2>
@@ -944,19 +944,19 @@
     <title>Specifying resource failover nodes with &crmsh;</title>
     <para>
      To determine a resource failover, use the meta attribute
-     <literal>migration-threshold</literal>. If failcount exceeds
+     <literal>migration-threshold</literal>. If the fail count exceeds
      <literal>migration-threshold</literal> on all nodes, the resource
      remains stopped. For example:
     </para>
  <screen>&prompt.crm.conf;<command>location rsc1-&node1; rsc1 100: &node1;</command></screen>
     <para>
      Normally, <literal>rsc1</literal> prefers to run on <literal>&node1;</literal>.
      If it fails there, <literal>migration-threshold</literal> is checked and compared
-     to the failcount. If <literal>failcount</literal> &gt;= <literal>migration-threshold</literal>
+     to the fail count. If <literal>failcount</literal> &gt;= <literal>migration-threshold</literal>,
      then the resource is migrated to the node with the next best preference.
     </para>
     <para>
-     Start failures set the failcount to inf depend on the
+     Start failures set the fail count to inf depend on the
      <option>start-failure-is-fatal</option> option. Stop failures cause
      fencing. If there is no STONITH defined, the resource will not migrate.
     </para>