-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow to send empty subscription and update version afterward #11264
Conversation
Signed-off-by: Megrez Lu <[email protected]>
Signed-off-by: Megrez Lu <[email protected]>
Signed-off-by: Megrez Lu <[email protected]>
@@ -291,7 +291,6 @@ public void run() { | |||
} | |||
if (resourceSubscribers.get(type).isEmpty()) { | |||
resourceSubscribers.remove(type); | |||
subscribedResourceTypeUrls.remove(type.typeUrl()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We definitely want to clean up the resource. Nonces are per-resource? Then it seems when the client starts watches again the server should notice the lack of nonce. The issue might be instead that we aren't cleaning up AdsStream.respNonces
?
Note that maybe we should do the new I/O you are causing in this PR, but maybe we allow sending the ACK even when subscribedResourceTypeUrls lacks the type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of removing line 294, have it call a cleanup method on the subscriber.controlPlaneClient
(if it isn't null) to remove the nonce. You'll have to create the cleanup method that you'll be calling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good to me. @lujiajing1126, if it turns out to be annoying to make that change, tell us and we'll see how we can help. Also, if you think that wouldn't fully address what you noticed, say so. I don't fully understand "adjustResourceSubscription issue;" it just looks like the same nonce issue to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Also, if you are uncertain about the changes, you can send them out before you update/fix any tests. A sort of early review.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We definitely want to clean up the resource. Nonces are per-resource? Then it seems when the client starts watches again the server should notice the lack of nonce. The issue might be instead that we aren't cleaning up AdsStream.respNonces?
Instead of removing line 294, have it call a cleanup method on the
subscriber.controlPlaneClient
(if it isn't null) to remove the nonce. You'll have to create the cleanup method that you'll be calling.
I agree with both of you. Instead of creating a cleanup method, I've merged cleanup logic into the existing adjustResourceSubscription
method. PTAL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good to me. @lujiajing1126, if it turns out to be annoying to make that change, tell us and we'll see how we can help. Also, if you think that wouldn't fully address what you noticed, say so. I don't fully understand "adjustResourceSubscription issue;" it just looks like the same nonce issue to me.
Yes. Exactly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Also, if you are uncertain about the changes, you can send them out before you update/fix any tests. A sort of early review.)
I tried to fix this issue based on the comment (without modifying the test cases)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please modify the failing test case to expect the nonce to be reset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please modify the failing test case to expect the nonce to be reset.
Test case has been fixed with some additional helper to access the underlying private/package-private fields
@@ -291,7 +291,6 @@ public void run() { | |||
} | |||
if (resourceSubscribers.get(type).isEmpty()) { | |||
resourceSubscribers.remove(type); | |||
subscribedResourceTypeUrls.remove(type.typeUrl()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of removing line 294, have it call a cleanup method on the subscriber.controlPlaneClient
(if it isn't null) to remove the nonce. You'll have to create the cleanup method that you'll be calling.
Signed-off-by: Megrez Lu <[email protected]>
Gentle Ping @larry-safran @ejona86 |
I'm not wild about the white box testing, especially since this issue was originally a problem for the control plane; it seems we should be able to test without exposing deep internals. I'd like to take a look and see if I can clean it up. |
@lujiajing1126, I changed up the test. But while doing so I realized your problem was probably not caused by the nonce; that's only used for ACK/NACK reporting. The problem was more likely that version wasn't being cleared. Take a look at my changes and see if they seem right to you. Re-requesting review from @larry-safran since clearing version is an important difference from before. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cleaning up versions makes sense to me and I don't see any opportunity for harm.
Thanks! LGTM Version should be cleared as well. |
Looking at the fix, I'm still not certain it is right. I'm seeing that we don't send an unsubscription for the last resource. Looking at the Go implementation, it seems it does send such a request. I'm thinking maybe we should send one last request to fully unsubscribe, and then delete the version+nonce, simply because we don't care about them any more. But it seems it shouldn't matter for correctness. It was gnawing on me because we don't have a problem with keeping the version+nonce around when unsubscribing from a resource, if there are other resources still present. And that's because we send an updated request. We really need to ask around for this, I think. |
In my first commit, i.e. 037cdef, I did send an empty request to the xdsServer. As I understood, this is required by the xDS protocol, quoted,
Beside, according to the istio implementation, fully unsubscribing resources would definitely help reduce memory consumption at the server side. https://github.com/istio/istio/blob/6aa63cb62372ba9e79940df6895273725765b5bb/pkg/xds/server.go#L369 |
I found the thread where it was suggested to remove it. I think the reasoning was just a bit mistaken. I've added it back. Also, I verified this won't be considered a wildcard watch. All but one usage of getSubscribedResources() creates an empty collection when the method returns null. And the one that doesn't is impossible for it to be null. I purposefully chose not to change that as part of this CL to keep the bug fix to-the-point. |
Thanks! LGTM |
Otherwise, the server will continue sending updates and if we re-subscribe to the last resource, the server won't re-send it. Also completely remove the per-type state, as it could only add confusion.
Otherwise, the server will continue sending updates and if we re-subscribe to the last resource, the server won't re-send it. Also completely remove the per-type state, as it could only add confusion.
#11796 fixes nonce handling; it needs to be remembered for the life of the ADS stream. |
Closes #11232
As described in the original issue:
If a CDS update revoke all subscribed resources, all EDS subscriptions will be cancelled.
adjustResourceSubscription issue
The first observation of the
istiod
logs is,If EDS is empty,
adjustResourceSubscription
does not work.resourceVersion update issue
If all resources are removed from a type, e.g. EDS,
subscribedResourceTypeUrls
entry will be also removed. This will prevent nonce being updated from the upstream EDS PUSH.