Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ai/live: Slow orchestrator detection #3308

Merged
merged 7 commits into from
Dec 16, 2024
Merged

ai/live: Slow orchestrator detection #3308

merged 7 commits into from
Dec 16, 2024

Conversation

j0sh
Copy link
Collaborator

@j0sh j0sh commented Dec 11, 2024

Detect 'slow' orchs by keeping track of in-flight segments.
Count the difference between segments produced and segments completed.
There should only be ~1 segment in-flight concurrently.

Sometimes the beginning of the current segment may briefly overlap
with the end of the previous segment, so accommodate that by checking
for the second-to-last segment.

@j0sh j0sh requested review from victorges, leszko and mjh1 December 11, 2024 06:43
@github-actions github-actions bot added go Pull requests that update Go code AI Issues and PR related to the AI-video branch. labels Dec 11, 2024
@leszko
Copy link
Contributor

leszko commented Dec 11, 2024

I'm not sure I understand it correctly, but isn't 1 segment in flight a very strict requirement? I'm a little concern about the case like this:

  • Gateway has some temp network delays
  • It results in detecting each Orchestartor as slow
  • We swap from O to O, effectively taking the whole network capacity

Detect 'slow' orchs by keeping track of in-flight segments.
Count the difference between segments produced and segments completed.
There should only be ~1 segment in-flight concurrently.

Sometimes the beginning of the current segment may briefly overlap
with the end of the previous segment, so accommodate that by
checking for the second-to-last segment.
@j0sh j0sh force-pushed the ja/check-slow-orchs branch from a8dbc25 to 5c3a5e0 Compare December 12, 2024 08:20
@j0sh j0sh marked this pull request as ready for review December 12, 2024 08:20
Copy link

codecov bot commented Dec 12, 2024

Codecov Report

Attention: Patch coverage is 0% with 26 lines in your changes missing coverage. Please review.

Project coverage is 33.92348%. Comparing base (df2ed58) to head (ade1f86).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
server/ai_live_video.go 0.00000% 26 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@                 Coverage Diff                 @@
##              master       #3308         +/-   ##
===================================================
- Coverage   33.94992%   33.92348%   -0.02644%     
===================================================
  Files            141         141                 
  Lines          37140       37166         +26     
===================================================
- Hits           12609       12608          -1     
- Misses         23811       23838         +27     
  Partials         720         720                 
Files with missing lines Coverage Δ
server/ai_live_video.go 0.00000% <0.00000%> (ø)

... and 1 file with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update df2ed58...ade1f86. Read the comment docs.

Files with missing lines Coverage Δ
server/ai_live_video.go 0.00000% <0.00000%> (ø)

... and 1 file with indirect coverage changes

@mjh1
Copy link
Contributor

mjh1 commented Dec 12, 2024

I'm not sure I understand it correctly, but isn't 1 segment in flight a very strict requirement? I'm a little concern about the case like this:

  • Gateway has some temp network delays
  • It results in detecting each Orchestartor as slow
  • We swap from O to O, effectively taking the whole network capacity

@j0sh Yeah maybe we can give a couple of chances, if it's detected as slow 2 or 3 times in a row then terminate. What's the effect on the output stream if there are more than 1 segment in flight? Would it just skip a segment which wasn't processed quickly enough and the stream pauses then jumps forward to live?

@j0sh
Copy link
Collaborator Author

j0sh commented Dec 12, 2024

What's the effect on the output stream if there are more than 1 segment in flight?

Assuming things are just "slow", the subscriptions would still run in order. If segment A is slow, but segments B completes at a normal pace and C is in-flight, then downstream would continue reading A until its done, then fetch B (which would hopefully download more quickly), then C (which the server will 'catch up' then trickle out)

The server currently retains the last 5 segments so if the subscriber falls more than 5 behind, it will 404 (which is not the best behavior right now; I will update that separately to indicate "this stream exists but this segment doesn't" so the client can handle that better, eg try the next segment or jump up to the leading edge)

@j0sh
Copy link
Collaborator Author

j0sh commented Dec 13, 2024

Relaxed the check to allow up to 3 segments inflight, and reworked a few things in preparation for the next round of changes after this PR: 7c00bae

Copy link
Contributor

@leszko leszko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, as discussed in Discord, let's merge this one (without O's swap) and we'll think later what to do with O's swaps.

@j0sh j0sh merged commit 7fe7949 into master Dec 16, 2024
18 checks passed
@j0sh j0sh deleted the ja/check-slow-orchs branch December 16, 2024 22:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AI Issues and PR related to the AI-video branch. go Pull requests that update Go code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants