Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abstract controller API from crm_resource #1981

Merged
merged 10 commits into from
Feb 24, 2020

Conversation

kgaillot
Copy link
Contributor

This isolates the controller-related code of crm_resource into new files crm_resource_controller.h and crm_resource_controller.c, creating an IPC API interface for the controller somewhat like what we have for other daemons. This is intended as a first step toward a common model for daemon IPC APIs, to be refined and eventually moved to libcrmcommon.

It is also hoped to fix CLBZ#5420, or at least make it easier to fix.

@kgaillot
Copy link
Contributor Author

Hi @HideoYamauchi , can you test this and see if it helps the issue in CLBZ#5420?

@wenningerk and @clumens , this is a rough idea for a daemon IPC model, as discussed in #1957 for pacemakerd. I think it needs considerable work to be usable for all our use cases, but it should give some ideas of a direction.

@HideoYamauchi
Copy link
Contributor

Hi Ken,

Hi @HideoYamauchi , can you test this and see if it helps the issue in CLBZ#5420?

Okay!
I will check your fix next week.

Best Regards,
Hideo Yamauchi.

@HideoYamauchi
Copy link
Contributor

HideoYamauchi commented Jan 29, 2020

Hi Ken,

I checked the operation with the modified version.
However, if you run crm_resource multiple times in succession, the log of the problem still appears.(CLBZ#5420)

[root@rh80-test01 ~]# crm_mon -1 -Af
Cluster Summary:
  * Stack: corosync
  * Current DC: rh80-test01 (version 2.0.3-df6c286d96) - partition WITHOUT quorum
  * Last updated: Wed Jan 29 08:59:09 2020
  * Last change:  Wed Jan 29 08:59:03 2020 by root via cibadmin on rh80-test01
  * 2 nodes configured
  * 2 resource instances configured

Node List:
  * Online: [ rh80-test01 ]
  * OFFLINE: [ rh80-test02 ]

Active Resources:
  * Clone Set: prmPing-clone [prmPing]:
    * Started: [ rh80-test01 ]

Migration Summary:

[root@rh80-test01 ~]# /usr/sbin/crm_resource --refresh --resource prmPing-clone 
Cleaned up prmPing:0 on rh80-test01
Cleaned up prmPing:1 on rh80-test01
Waiting for 2 replies from the controller.. OK
[root@rh80-test01 ~]# /usr/sbin/crm_resource --refresh --resource prmPing-clone 
Cleaned up prmPing:0 on rh80-test01
Cleaned up prmPing:1 on rh80-test01
Waiting for 2 replies from the controller.. OK
[root@rh80-test01 ~]# /usr/sbin/crm_resource --refresh --resource prmPing-clone 
Cleaned up prmPing:0 on rh80-test01
Cleaned up prmPing:1 on rh80-test01
Waiting for 2 replies from the controller.. OK
[root@rh80-test01 ~]# /usr/sbin/crm_resource --refresh --resource prmPing-clone 
Cleaned up prmPing:0 on rh80-test01
Cleaned up prmPing:1 on rh80-test01
Waiting for 2 replies from the controller.. OK
[root@rh80-test01 ~]# /usr/sbin/crm_resource --refresh --resource prmPing-clone 
Cleaned up prmPing:0 on rh80-test01
Cleaned up prmPing:1 on rh80-test01
Waiting for 2 replies from the controller.. OK
[root@rh80-test01 ~]# /usr/sbin/crm_resource --refresh --resource prmPing-clone 
Cleaned up prmPing:0 on rh80-test01
Cleaned up prmPing:1 on rh80-test01
Waiting for 2 replies from the controller.. OK
[root@rh80-test01 ~]# /usr/sbin/crm_resource --refresh --resource prmPing-clone 
Cleaned up prmPing:0 on rh80-test01
Cleaned up prmPing:1 on rh80-test01
Waiting for 2 replies from the controller.. OK
[root@rh80-test01 ~]# /usr/sbin/crm_resource --refresh --resource prmPing-clone 
Cleaned up prmPing:0 on rh80-test01
Cleaned up prmPing:1 on rh80-test01
Waiting for 2 replies from the controller.. OK

[root@rh80-test01 pacemaker-master]# tail -f  /var/log/pacemaker/pacemaker.log | grep Bro
Jan 29 08:59:47 rh80-test01 pacemaker-controld  [3435] (qb_ipcs_event_sendv)    warning: new_event_notification (/dev/shm/qb-3435-3640-14-rsXj0P/qb): Broken pipe (32)
Jan 29 08:59:55 rh80-test01 pacemaker-controld  [3435] (qb_ipcs_event_sendv)    warning: new_event_notification (/dev/shm/qb-3435-3752-14-i2oDAU/qb): Broken pipe (32)

Let's look at the details a bit from tomorrow.

Best Regards,
Hideo Yamauchi.

@kgaillot kgaillot force-pushed the clbz5420 branch 2 times, most recently from 7f6c592 to f1f43dc Compare January 29, 2020 15:03
@kgaillot
Copy link
Contributor Author

I checked the operation with the modified version.
However, if you run crm_resource multiple times in succession, the log of the problem still appears.(CLBZ#5420)

Hi Hideo,

Thanks for the test. I've added a commit that ensures the connection is always closed gracefully before exit; would you have a chance to test it again?

@HideoYamauchi
Copy link
Contributor

Hi Ken,

Okay!
I will confirm your fix.

Best Regards,
Hideo Yamauchi.

@HideoYamauchi
Copy link
Contributor

Hi Ken,

I tested the new fix.
But it seems that there is a log of the problem.

[root@rh80-test01 ~]# crm_mon -1 -Af
Cluster Summary:
  * Stack: corosync
  * Current DC: rh80-test01 (version 2.0.3-df6c286d96) - partition WITHOUT quorum
  * Last updated: Thu Jan 30 09:06:43 2020
  * Last change:  Thu Jan 30 09:05:44 2020 by root via cibadmin on rh80-test01
  * 2 nodes configured
  * 2 resource instances configured

Node List:
  * Online: [ rh80-test01 ]
  * OFFLINE: [ rh80-test02 ]

Active Resources:
  * Clone Set: prmPing-clone [prmPing]:
    * Started: [ rh80-test01 ]

Node Attributes:
  * Node: rh80-test01:
    * default_ping_set                  : 100       

Migration Summary:

[root@rh80-test01 ~]# /usr/sbin/crm_resource --refresh --resource prmPing-clone 
Cleaned up prmPing:0 on rh80-test01
Cleaned up prmPing:1 on rh80-test01
Waiting for 2 replies from the controller.. OK
[root@rh80-test01 ~]# /usr/sbin/crm_resource --refresh --resource prmPing-clone 
^[[ACleaned up prmPing:0 on rh80-test01
Cleaned up prmPing:1 on rh80-test01
Waiting for 2 replies from the controller.. OK
[root@rh80-test01 ~]# /usr/sbin/crm_resource --refresh --resource prmPing-clone 
Cleaned up prmPing:0 on rh80-test01
Cleaned up prmPing:1 on rh80-test01
Waiting for 2 replies from the controller.. OK
[root@rh80-test01 ~]# /usr/sbin/crm_resource --refresh --resource prmPing-clone 
Cleaned up prmPing:0 on rh80-test01
Cleaned up prmPing:1 on rh80-test01
Waiting for 2 replies from the controller.. OK

[root@rh80-test01 ~]# !tail
tail -f  /var/log/pacemaker/pacemaker.log | grep Bro
Jan 30 09:07:27 rh80-test01 pacemaker-controld  [6893] (qb_ipcs_event_sendv)    warning: new_event_notification (/dev/shm/qb-6893-7269-14-eTAQ5E/qb): Broken pipe (32)
Jan 30 09:07:29 rh80-test01 pacemaker-controld  [6893] (qb_ipcs_event_sendv)    warning: new_event_notification (/dev/shm/qb-6893-7271-14-Gvry7p/qb): Broken pipe (32)
Jan 30 09:07:30 rh80-test01 pacemaker-controld  [6893] (qb_ipcs_event_sendv)    warning: new_event_notification (/dev/shm/qb-6893-7273-14-ioLP0d/qb): Broken pipe (32)
Jan 30 09:07:31 rh80-test01 pacemaker-controld  [6893] (qb_ipcs_event_sendv)    warning: new_event_notification (/dev/shm/qb-6893-7275-14-cq4ML4/qb): Broken pipe (32)


Best Regards,
Hideo Yamauchi.

@HideoYamauchi
Copy link
Contributor

HideoYamauchi commented Feb 18, 2020

Hi Ken,

I am seeing the problem a little more.

If a problem occurs, it seems to be sent after the client is disconnected at the timing when sending to the client with crm_ipcs_flush_events ().
In this case, crmd will try to send three messages to crm_resource.
As a result, a "Broken pipe" log is output.

If another message transfer is entered on the way, the problem will not occur because it recognizes the disconnection of the client.
In this case, the third message is not sent.

(snip)
Feb 18 11:28:38 rh80-test01 pacemaker-controld  [899] (process_lrm_event)       notice: Not recording prmPing_monitor_0 result in CIB because resource information was removed since it was initiated
(snip)
Feb 18 11:28:38 rh80-test01 pacemaker-controld  [899] (send_msg_via_ipc)        debug: Unknown Sub-system (05a8927c-df2b-4a0c-b980-db8720a1ed17)... discarding message.
(snip)

In the first place, in the procedure I reported, the number of controld daemon responses is three, so it seems that crm_resource that waits for two can not handle it in many cases.


Since the response is returned by send_task_ok_ack(), suppressing the response to crm_resource in delete_rsc_entry() may solve the problem.

/*	 A_LRM_INVOKE	*/
void
do_lrm_invoke(long long action,
              enum crmd_fsa_cause cause,
              enum crmd_fsa_state cur_state,
              enum crmd_fsa_input current_input, fsa_data_t * msg_data)
{
(snip)
        } else if ((rc < 0) && !create_rsc) {
            /* Delete of malformed or nonexistent resource
             * (deleting something that does not exist is a success)
             */
            crm_notice("Not registering resource '%s' for a %s event "
                       CRM_XS " get-rc=%d (%s) transition-key=%s",
                       ID(xml_rsc), operation,
                       rc, pcmk_strerror(rc), ID(input->xml));
            delete_rsc_entry(lrm_state, input, ID(xml_rsc), NULL, pcmk_ok,
                             user_name);
            send_task_ok_ack(lrm_state, input, ID(xml_rsc), NULL, operation,
                             from_host, from_sys);
            return;


I think a little more about how to fix the problem.

Best Regards,
Hideo Yamauchi.

This is an initial step towards a public API in libcrmcommon.
Previously, crm_resource would close open connections and free memory before
exiting only in certain circumstances. Now, it always does.
For all of the daemons except the executor, the IPC post-connection handler
did nothing but log a trace message with the libqb client pointer address.
This is unnecessary since the daemons log a trace when first accepting the
connection, and libqb logs another when creating the client.
Previously, some callers of pcmk__client_data2xml() trace-logged the resulting
XML, and some didn't. Moving the log into the function itself reduces
duplication and ensures the trace is always done (at the minor cost of not
being able to trace just one daemon's incoming IPC).

Also remove an unused argument from the function.
@kgaillot
Copy link
Contributor Author

Hi @HideoYamauchi ,

I am still investigating this, but I've added more commits to improve the logging on the controller side.

I now suspect the problem is in the controller and not crm_resource. I can reproduce the problem by running crm_resource --refresh three times in quick succession, but not if I wait a couple of seconds in between. That suggests one client connection is affecting how the controller handles the next connection if there's not some time in between. Another possibility is that the difference between a good run and a bad run has to do with whether there is a monitor operation in flight when the resource is cleaned up.

@kgaillot
Copy link
Contributor Author

Hi @HideoYamauchi ,

I am getting further with the investigation. There appears to be a problem on each side (crm_resource and the controller). It occurs when refreshing a clone that hasn't been probed yet (whether at start-up or just after a refresh).

On the crm_resource side, the behavior is different depending on whether the clone has been probed. If the clone has been probed, crm_resource sends the controller one request per clone instance, with each instance properly assigned to a particular node. Here is an example of part of the IPC request:

<rsc_op transition-key="51349:0:0:xxxxxxxx-xrsc-opxx-xcrm-resourcexxxx" on_node="node1">
    <primitive id="pinger" long-id="pinger:0" class="ocf" provider="pacemaker" type="ping"/>

Note that the id is "pinger" with a long-id of "pinger:0", and in this case there will be only one request sent for pinger:0, on node1.

When the clone has not been probed, crm_resource sends a request for each clone instance and node combination, using only the numbered instance name:

<rsc_op transition-key="51352:0:0:xxxxxxxx-xrsc-opxx-xcrm-resourcexxxx" on_node="node1">
    <primitive id="pinger:0" class="ocf" provider="pacemaker" type="ping"/>

Note the id is "pinger:0" with no long-id, and in this case, this request will be repeated for each node, and then requests for pinger:1 will be sent for each node, etc.

My first thought is that for anonymous clones that aren't probed, the crm_resource should only send one request per node, using the unnumbered name. For unique clones the current behavior makes sense. It's possible the current behavior is there to be able to clear the history in case a unique clone was converted to anonymous, but I doubt that's the right approach. (This fix would work around the problem, and is probably a good idea, but isn't necessary.)

On the controller side, when receiving the requests in the first case, everything proceeds reasonably, with one reply per request. However in the second case, for some reason I haven't figured out yet, the controller replies twice to each request for the local node. Since crm_resource is only counting replies (not matching replies to requests), this throws off its count and it exits early.

Fixing the controller such that it only replies once to each reply would avoid the issue correctly, and would make the work-around on the crm_resource side unnecessary.

Finally, another good idea would be to make the controller and crm_resource match requests to replies. That would allow crm_resource to discard any duplicate replies for the same request, which would also work around the problem. However this would be a major change in the controller IPC protocol; it currently does not have a "call ID" concept like the CIB, fencing, and executor IPC protocols. This is definitely a good idea but is probably a longer term project.

@kgaillot
Copy link
Contributor Author

Even though this PR does not fix the issue, I am inclined to merge it as-is once testing completes, since it is a good first step towards improving the controller IPC protocol, and it also provides better logs.

@kgaillot kgaillot merged commit 6bda5dc into ClusterLabs:master Feb 24, 2020
@HideoYamauchi
Copy link
Contributor

Hi Ken,

Even though this PR does not fix the issue, I am inclined to merge it as-is once testing completes, ?>since it is a good first step towards improving the controller IPC protocol, and it also provides better logs.
Okay!

Let's consider this issue further to improve it in the future.

Best Regards,
Hideo Yamauchi.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants