Abstract controller API from crm_resource #1981

kgaillot · 2020-01-23T23:42:13Z

This isolates the controller-related code of crm_resource into new files crm_resource_controller.h and crm_resource_controller.c, creating an IPC API interface for the controller somewhat like what we have for other daemons. This is intended as a first step toward a common model for daemon IPC APIs, to be refined and eventually moved to libcrmcommon.

It is also hoped to fix CLBZ#5420, or at least make it easier to fix.

kgaillot · 2020-01-23T23:45:21Z

Hi @HideoYamauchi , can you test this and see if it helps the issue in CLBZ#5420?

@wenningerk and @clumens , this is a rough idea for a daemon IPC model, as discussed in #1957 for pacemakerd. I think it needs considerable work to be usable for all our use cases, but it should give some ideas of a direction.

HideoYamauchi · 2020-01-24T08:18:14Z

Hi Ken,

Hi @HideoYamauchi , can you test this and see if it helps the issue in CLBZ#5420?

Okay!
I will check your fix next week.

Best Regards,
Hideo Yamauchi.

HideoYamauchi · 2020-01-29T00:03:18Z

Hi Ken,

I checked the operation with the modified version.
However, if you run crm_resource multiple times in succession, the log of the problem still appears.(CLBZ#5420)

[root@rh80-test01 ~]# crm_mon -1 -Af
Cluster Summary:
  * Stack: corosync
  * Current DC: rh80-test01 (version 2.0.3-df6c286d96) - partition WITHOUT quorum
  * Last updated: Wed Jan 29 08:59:09 2020
  * Last change:  Wed Jan 29 08:59:03 2020 by root via cibadmin on rh80-test01
  * 2 nodes configured
  * 2 resource instances configured

Node List:
  * Online: [ rh80-test01 ]
  * OFFLINE: [ rh80-test02 ]

Active Resources:
  * Clone Set: prmPing-clone [prmPing]:
    * Started: [ rh80-test01 ]

Migration Summary:

[root@rh80-test01 ~]# /usr/sbin/crm_resource --refresh --resource prmPing-clone 
Cleaned up prmPing:0 on rh80-test01
Cleaned up prmPing:1 on rh80-test01
Waiting for 2 replies from the controller.. OK
[root@rh80-test01 ~]# /usr/sbin/crm_resource --refresh --resource prmPing-clone 
Cleaned up prmPing:0 on rh80-test01
Cleaned up prmPing:1 on rh80-test01
Waiting for 2 replies from the controller.. OK
[root@rh80-test01 ~]# /usr/sbin/crm_resource --refresh --resource prmPing-clone 
Cleaned up prmPing:0 on rh80-test01
Cleaned up prmPing:1 on rh80-test01
Waiting for 2 replies from the controller.. OK
[root@rh80-test01 ~]# /usr/sbin/crm_resource --refresh --resource prmPing-clone 
Cleaned up prmPing:0 on rh80-test01
Cleaned up prmPing:1 on rh80-test01
Waiting for 2 replies from the controller.. OK
[root@rh80-test01 ~]# /usr/sbin/crm_resource --refresh --resource prmPing-clone 
Cleaned up prmPing:0 on rh80-test01
Cleaned up prmPing:1 on rh80-test01
Waiting for 2 replies from the controller.. OK
[root@rh80-test01 ~]# /usr/sbin/crm_resource --refresh --resource prmPing-clone 
Cleaned up prmPing:0 on rh80-test01
Cleaned up prmPing:1 on rh80-test01
Waiting for 2 replies from the controller.. OK
[root@rh80-test01 ~]# /usr/sbin/crm_resource --refresh --resource prmPing-clone 
Cleaned up prmPing:0 on rh80-test01
Cleaned up prmPing:1 on rh80-test01
Waiting for 2 replies from the controller.. OK
[root@rh80-test01 ~]# /usr/sbin/crm_resource --refresh --resource prmPing-clone 
Cleaned up prmPing:0 on rh80-test01
Cleaned up prmPing:1 on rh80-test01
Waiting for 2 replies from the controller.. OK

[root@rh80-test01 pacemaker-master]# tail -f  /var/log/pacemaker/pacemaker.log | grep Bro
Jan 29 08:59:47 rh80-test01 pacemaker-controld  [3435] (qb_ipcs_event_sendv)    warning: new_event_notification (/dev/shm/qb-3435-3640-14-rsXj0P/qb): Broken pipe (32)
Jan 29 08:59:55 rh80-test01 pacemaker-controld  [3435] (qb_ipcs_event_sendv)    warning: new_event_notification (/dev/shm/qb-3435-3752-14-i2oDAU/qb): Broken pipe (32)

Let's look at the details a bit from tomorrow.

Best Regards,
Hideo Yamauchi.

kgaillot · 2020-01-29T18:54:46Z

I checked the operation with the modified version.
However, if you run crm_resource multiple times in succession, the log of the problem still appears.(CLBZ#5420)

Hi Hideo,

Thanks for the test. I've added a commit that ensures the connection is always closed gracefully before exit; would you have a chance to test it again?

HideoYamauchi · 2020-01-29T23:35:29Z

Hi Ken,

Okay!
I will confirm your fix.

Best Regards,
Hideo Yamauchi.

HideoYamauchi · 2020-01-30T00:11:44Z

Hi Ken,

I tested the new fix.
But it seems that there is a log of the problem.

[root@rh80-test01 ~]# crm_mon -1 -Af
Cluster Summary:
  * Stack: corosync
  * Current DC: rh80-test01 (version 2.0.3-df6c286d96) - partition WITHOUT quorum
  * Last updated: Thu Jan 30 09:06:43 2020
  * Last change:  Thu Jan 30 09:05:44 2020 by root via cibadmin on rh80-test01
  * 2 nodes configured
  * 2 resource instances configured

Node List:
  * Online: [ rh80-test01 ]
  * OFFLINE: [ rh80-test02 ]

Active Resources:
  * Clone Set: prmPing-clone [prmPing]:
    * Started: [ rh80-test01 ]

Node Attributes:
  * Node: rh80-test01:
    * default_ping_set                  : 100       

Migration Summary:

[root@rh80-test01 ~]# /usr/sbin/crm_resource --refresh --resource prmPing-clone 
Cleaned up prmPing:0 on rh80-test01
Cleaned up prmPing:1 on rh80-test01
Waiting for 2 replies from the controller.. OK
[root@rh80-test01 ~]# /usr/sbin/crm_resource --refresh --resource prmPing-clone 
^[[ACleaned up prmPing:0 on rh80-test01
Cleaned up prmPing:1 on rh80-test01
Waiting for 2 replies from the controller.. OK
[root@rh80-test01 ~]# /usr/sbin/crm_resource --refresh --resource prmPing-clone 
Cleaned up prmPing:0 on rh80-test01
Cleaned up prmPing:1 on rh80-test01
Waiting for 2 replies from the controller.. OK
[root@rh80-test01 ~]# /usr/sbin/crm_resource --refresh --resource prmPing-clone 
Cleaned up prmPing:0 on rh80-test01
Cleaned up prmPing:1 on rh80-test01
Waiting for 2 replies from the controller.. OK

[root@rh80-test01 ~]# !tail
tail -f  /var/log/pacemaker/pacemaker.log | grep Bro
Jan 30 09:07:27 rh80-test01 pacemaker-controld  [6893] (qb_ipcs_event_sendv)    warning: new_event_notification (/dev/shm/qb-6893-7269-14-eTAQ5E/qb): Broken pipe (32)
Jan 30 09:07:29 rh80-test01 pacemaker-controld  [6893] (qb_ipcs_event_sendv)    warning: new_event_notification (/dev/shm/qb-6893-7271-14-Gvry7p/qb): Broken pipe (32)
Jan 30 09:07:30 rh80-test01 pacemaker-controld  [6893] (qb_ipcs_event_sendv)    warning: new_event_notification (/dev/shm/qb-6893-7273-14-ioLP0d/qb): Broken pipe (32)
Jan 30 09:07:31 rh80-test01 pacemaker-controld  [6893] (qb_ipcs_event_sendv)    warning: new_event_notification (/dev/shm/qb-6893-7275-14-cq4ML4/qb): Broken pipe (32)

Best Regards,
Hideo Yamauchi.

HideoYamauchi · 2020-02-18T02:25:06Z

Hi Ken,

I am seeing the problem a little more.

If a problem occurs, it seems to be sent after the client is disconnected at the timing when sending to the client with crm_ipcs_flush_events ().
In this case, crmd will try to send three messages to crm_resource.
As a result, a "Broken pipe" log is output.

If another message transfer is entered on the way, the problem will not occur because it recognizes the disconnection of the client.
In this case, the third message is not sent.

(snip)
Feb 18 11:28:38 rh80-test01 pacemaker-controld  [899] (process_lrm_event)       notice: Not recording prmPing_monitor_0 result in CIB because resource information was removed since it was initiated
(snip)
Feb 18 11:28:38 rh80-test01 pacemaker-controld  [899] (send_msg_via_ipc)        debug: Unknown Sub-system (05a8927c-df2b-4a0c-b980-db8720a1ed17)... discarding message.
(snip)

In the first place, in the procedure I reported, the number of controld daemon responses is three, so it seems that crm_resource that waits for two can not handle it in many cases.

Since the response is returned by send_task_ok_ack(), suppressing the response to crm_resource in delete_rsc_entry() may solve the problem.

/*	 A_LRM_INVOKE	*/
void
do_lrm_invoke(long long action,
              enum crmd_fsa_cause cause,
              enum crmd_fsa_state cur_state,
              enum crmd_fsa_input current_input, fsa_data_t * msg_data)
{
(snip)
        } else if ((rc < 0) && !create_rsc) {
            /* Delete of malformed or nonexistent resource
             * (deleting something that does not exist is a success)
             */
            crm_notice("Not registering resource '%s' for a %s event "
                       CRM_XS " get-rc=%d (%s) transition-key=%s",
                       ID(xml_rsc), operation,
                       rc, pcmk_strerror(rc), ID(input->xml));
            delete_rsc_entry(lrm_state, input, ID(xml_rsc), NULL, pcmk_ok,
                             user_name);
            send_task_ok_ack(lrm_state, input, ID(xml_rsc), NULL, operation,
                             from_host, from_sys);
            return;

I think a little more about how to fix the problem.

Best Regards,
Hideo Yamauchi.

This is an initial step towards a public API in libcrmcommon.

Previously, crm_resource would close open connections and free memory before exiting only in certain circumstances. Now, it always does.

For all of the daemons except the executor, the IPC post-connection handler did nothing but log a trace message with the libqb client pointer address. This is unnecessary since the daemons log a trace when first accepting the connection, and libqb logs another when creating the client.

Previously, some callers of pcmk__client_data2xml() trace-logged the resulting XML, and some didn't. Moving the log into the function itself reduces duplication and ensures the trace is always done (at the minor cost of not being able to trace just one daemon's incoming IPC). Also remove an unused argument from the function.

kgaillot · 2020-02-22T01:17:43Z

Hi @HideoYamauchi ,

I am still investigating this, but I've added more commits to improve the logging on the controller side.

I now suspect the problem is in the controller and not crm_resource. I can reproduce the problem by running crm_resource --refresh three times in quick succession, but not if I wait a couple of seconds in between. That suggests one client connection is affecting how the controller handles the next connection if there's not some time in between. Another possibility is that the difference between a good run and a bad run has to do with whether there is a monitor operation in flight when the resource is cleaned up.

kgaillot · 2020-02-24T17:25:55Z

Hi @HideoYamauchi ,

I am getting further with the investigation. There appears to be a problem on each side (crm_resource and the controller). It occurs when refreshing a clone that hasn't been probed yet (whether at start-up or just after a refresh).

On the crm_resource side, the behavior is different depending on whether the clone has been probed. If the clone has been probed, crm_resource sends the controller one request per clone instance, with each instance properly assigned to a particular node. Here is an example of part of the IPC request:

<rsc_op transition-key="51349:0:0:xxxxxxxx-xrsc-opxx-xcrm-resourcexxxx" on_node="node1">
    <primitive id="pinger" long-id="pinger:0" class="ocf" provider="pacemaker" type="ping"/>

Note that the id is "pinger" with a long-id of "pinger:0", and in this case there will be only one request sent for pinger:0, on node1.

When the clone has not been probed, crm_resource sends a request for each clone instance and node combination, using only the numbered instance name:

<rsc_op transition-key="51352:0:0:xxxxxxxx-xrsc-opxx-xcrm-resourcexxxx" on_node="node1">
    <primitive id="pinger:0" class="ocf" provider="pacemaker" type="ping"/>

Note the id is "pinger:0" with no long-id, and in this case, this request will be repeated for each node, and then requests for pinger:1 will be sent for each node, etc.

My first thought is that for anonymous clones that aren't probed, the crm_resource should only send one request per node, using the unnumbered name. For unique clones the current behavior makes sense. It's possible the current behavior is there to be able to clear the history in case a unique clone was converted to anonymous, but I doubt that's the right approach. (This fix would work around the problem, and is probably a good idea, but isn't necessary.)

On the controller side, when receiving the requests in the first case, everything proceeds reasonably, with one reply per request. However in the second case, for some reason I haven't figured out yet, the controller replies twice to each request for the local node. Since crm_resource is only counting replies (not matching replies to requests), this throws off its count and it exits early.

Fixing the controller such that it only replies once to each reply would avoid the issue correctly, and would make the work-around on the crm_resource side unnecessary.

Finally, another good idea would be to make the controller and crm_resource match requests to replies. That would allow crm_resource to discard any duplicate replies for the same request, which would also work around the problem. However this would be a major change in the controller IPC protocol; it currently does not have a "call ID" concept like the CIB, fencing, and executor IPC protocols. This is definitely a good idea but is probably a longer term project.

kgaillot · 2020-02-24T17:35:51Z

Even though this PR does not fix the issue, I am inclined to merge it as-is once testing completes, since it is a good first step towards improving the controller IPC protocol, and it also provides better logs.

HideoYamauchi · 2020-02-24T22:18:25Z

Hi Ken,

Even though this PR does not fix the issue, I am inclined to merge it as-is once testing completes, ?>since it is a good first step towards improving the controller IPC protocol, and it also provides better logs.
Okay!

Let's consider this issue further to improve it in the future.

Best Regards,
Hideo Yamauchi.

kgaillot force-pushed the clbz5420 branch 2 times, most recently from 7f6c592 to f1f43dc Compare January 29, 2020 15:03

kgaillot force-pushed the clbz5420 branch from f1f43dc to 90ae939 Compare February 14, 2020 21:30

kgaillot added 9 commits February 21, 2020 10:03

Log: libcrmcommon: improve messages for IPC hello creation

d3b70de

Log: libcrmcommon: improve messages when sending IPC request

7bb0d1c

Refactor: tools: abstract controller IPC interface from crm_resource

7586739

This is an initial step towards a public API in libcrmcommon.

Fix: tools: clean up before exiting crm_resource

5b486ac

Previously, crm_resource would close open connections and free memory before exiting only in certain circumstances. Now, it always does.

Low: controller: improve client IPC hello validation

b0057ac

Log: controller: improve messages related to incoming IPC

afdecca

Log: libcrmcommon: improve new IPC client messages

fcc6714

kgaillot force-pushed the clbz5420 branch from 90ae939 to 66f7ec7 Compare February 22, 2020 01:14

Log: controller: improve message routing messages

0899a16

kgaillot force-pushed the clbz5420 branch from 66f7ec7 to 0899a16 Compare February 24, 2020 17:34

kgaillot merged commit 6bda5dc into ClusterLabs:master Feb 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Abstract controller API from crm_resource #1981

Abstract controller API from crm_resource #1981

kgaillot commented Jan 23, 2020

kgaillot commented Jan 23, 2020

HideoYamauchi commented Jan 24, 2020

HideoYamauchi commented Jan 29, 2020 •

edited

Loading

kgaillot commented Jan 29, 2020

HideoYamauchi commented Jan 29, 2020

HideoYamauchi commented Jan 30, 2020

HideoYamauchi commented Feb 18, 2020 •

edited

Loading

kgaillot commented Feb 22, 2020

kgaillot commented Feb 24, 2020

kgaillot commented Feb 24, 2020

HideoYamauchi commented Feb 24, 2020

Abstract controller API from crm_resource #1981

Abstract controller API from crm_resource #1981

Conversation

kgaillot commented Jan 23, 2020

kgaillot commented Jan 23, 2020

HideoYamauchi commented Jan 24, 2020

HideoYamauchi commented Jan 29, 2020 • edited Loading

kgaillot commented Jan 29, 2020

HideoYamauchi commented Jan 29, 2020

HideoYamauchi commented Jan 30, 2020

HideoYamauchi commented Feb 18, 2020 • edited Loading

kgaillot commented Feb 22, 2020

kgaillot commented Feb 24, 2020

kgaillot commented Feb 24, 2020

HideoYamauchi commented Feb 24, 2020

HideoYamauchi commented Jan 29, 2020 •

edited

Loading

HideoYamauchi commented Feb 18, 2020 •

edited

Loading