-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ai/live: Slow orchestrator detection #3308
Conversation
I'm not sure I understand it correctly, but isn't 1 segment in flight a very strict requirement? I'm a little concern about the case like this:
|
Detect 'slow' orchs by keeping track of in-flight segments. Count the difference between segments produced and segments completed. There should only be ~1 segment in-flight concurrently. Sometimes the beginning of the current segment may briefly overlap with the end of the previous segment, so accommodate that by checking for the second-to-last segment.
a8dbc25
to
5c3a5e0
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #3308 +/- ##
===================================================
- Coverage 33.94992% 33.92348% -0.02644%
===================================================
Files 141 141
Lines 37140 37166 +26
===================================================
- Hits 12609 12608 -1
- Misses 23811 23838 +27
Partials 720 720
... and 1 file with indirect coverage changes Continue to review full report in Codecov by Sentry.
|
@j0sh Yeah maybe we can give a couple of chances, if it's detected as slow 2 or 3 times in a row then terminate. What's the effect on the output stream if there are more than 1 segment in flight? Would it just skip a segment which wasn't processed quickly enough and the stream pauses then jumps forward to live? |
Assuming things are just "slow", the subscriptions would still run in order. If segment A is slow, but segments B completes at a normal pace and C is in-flight, then downstream would continue reading A until its done, then fetch B (which would hopefully download more quickly), then C (which the server will 'catch up' then trickle out) The server currently retains the last 5 segments so if the subscriber falls more than 5 behind, it will 404 (which is not the best behavior right now; I will update that separately to indicate "this stream exists but this segment doesn't" so the client can handle that better, eg try the next segment or jump up to the leading edge) |
Relaxed the check to allow up to 3 segments inflight, and reworked a few things in preparation for the next round of changes after this PR: 7c00bae |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, as discussed in Discord, let's merge this one (without O's swap) and we'll think later what to do with O's swaps.
Detect 'slow' orchs by keeping track of in-flight segments.
Count the difference between segments produced and segments completed.
There should only be ~1 segment in-flight concurrently.
Sometimes the beginning of the current segment may briefly overlap
with the end of the previous segment, so accommodate that by checking
for the second-to-last segment.