-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
etcd: fix dereferencing issue in commit queue causing contention and change design to be more scalable #5513
Conversation
d4b5fbe
to
5745fa5
Compare
5745fa5
to
42ce12b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, LGTM 💯
Only nits and questions at this point.
bbd8e1a
to
f806408
Compare
func (c *commitQueue) Stop() { | ||
// Signal the queue's condition variable to ensure the mainLoop reliably | ||
// unblocks to check for the exit condition. | ||
c.queueCond.Signal() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the past we've found that at times this initial signal gets sort of "swallowed" causing us to add a loop around it like in this areas:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we also need to cancel the main context, or send on some quit channel here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I wasn't aware of this. I suggested removing the loop in a previous review cycle 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to look up any references about this, but didn't succeed. Is this a know issue with Go conditional variables? Maybe we did the loops because of other circumstances on our side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a know issue with Go conditional variables? Maybe we did the loops because of other circumstances on our side?
Good question...we could just be using the API wrong, and then ended up settling on this workaround as it seemed to make a difference in practice. Otherwise, we would see behavior where things would never shutdown without this type of loop added. I guess I view it as more of a defensive thing, but we should def try to repro it in like golang playground or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, doesn't seem to be causing any issues in the itest
(which are actually all green on this PR!!!), so I guess lets just monitor it and see if it triggers any issue re shutdown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🎨
Just needs a rebase now to fix the release notes conflict. |
This commit fixes an issue where subsequent transaction retries may have changed the read/write sets inside the STM which in turn left junk references to these keys in the CommitQueue. The left keys potentially conflicted with subsequent transactions, queueing them up causing througput degradation.
This commit builds on the ideas of @cfromknecht in lnd/5153. The addition is that the design is now simpler and more robust by queueing up everything, but allowing maximal parallelism where txns don't block. Furthermore the commit makes CommitQueue.Done() private essentially removing the need to understand the queue externally.
f806408
to
e243be1
Compare
This commit splits the commit queue changes from #5392.
The original idea with the commit queue was to enable seamless concurrency for transactions that are non blocking, while queuing up all txns such that the number of individual retries is minimized. The old design failed to correctly implement the queuing up part which may have resulted in contention on certain keys and slowing things down considerably. This PR attempts to fix these issues by making the queue design simpler and more scalable.
Design changes heavily inspired by #5153
Rebased on: #5579