-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Orchestrator Swaps #2885
Fix Orchestrator Swaps #2885
Conversation
VID-430 Investigate why stream had highly variable transcode times
This cause the stream to have over 50 swaps in total. A portion of the logs can be viewed here: The client is using 1s segments, which already makes the window for a successful transcode very tight but we're also seeing transcode times vary wildly between 1-4 seconds per segment VID-429 Understand Orchestrator swap
It looks like we swapped at segment 1621, but I can't spot an obvious reason why - if we're missing logging that would let us debug this then let's add it. |
This PR and #2837 are two separate things. I'll try to explain it here.
|
Codecov Report
@@ Coverage Diff @@
## master #2885 +/- ##
===================================================
+ Coverage 56.38155% 56.42426% +0.04271%
===================================================
Files 89 89
Lines 19384 19403 +19
===================================================
+ Hits 10929 10948 +19
Misses 7849 7849
Partials 606 606
Continue to review full report in Codecov by Sentry.
|
@leszko gotcha, thanks for the explanation! |
Related to two Linear Tickets:
fix https://linear.app/livepeer/issue/VID-430/investigate-why-stream-had-highly-variable-transcode-times
fix https://linear.app/livepeer/issue/VID-429/understand-orchestrator-swap
Explanation for https://linear.app/livepeer/issue/VID-430/investigate-why-stream-had-highly-variable-transcode-times
The segment is very short
1s
. And we calculate the in-memory latency score asRTT transcoding time / segment duration
. If this value is greater than1
, then we swap the Orchestrators. For such short segments, the RTT is almost always higher than 1. The quick fix is to have a minimal segment duration. I've set it to1.5s
.Explanation for https://linear.app/livepeer/issue/VID-429/understand-orchestrator-swap
It's hard to tell why the Os were swapped, but I believe that it's because there was a segment in flight for longer than 1.5s, which causes the Orchestrator Swap. This PR does not fix anything wrt to that, but it adds additional logs which will help to analyze further cases like that.