-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AI] Add support for Object Detection pipeline #3228
base: master
Are you sure you want to change the base?
Conversation
Did an initial look through and it looks pretty good. Will update tomorrow when I can pull and run it locally. Thank you for adding tests! One question, would it make sense to have an option to only return the detections text response? The transcoding of the frames is CPU transcoding so will be pretty slow. @rickstaa or @leszko looking at doing nvidia transcoding is still a bit in the future right? |
I don't have enough context to answer this. @rickstaa may know more. |
If @rickstaa comments on the future of cpu transcoding then I can push a commit adding |
@leszko, @ad-astra-video I haven’t planned to replace the CPU for transcoding yet. I was considering holding off on that until it becomes necessary for the real-time version of this pipeline. I think for now @ad-astra-video's solution makes sense. |
* Add Gateway ETH Address to Kafka events * Uncaught error * Typo
This commit ensures that the PR labeler action is working as expected again.
@RUFFY-369 I have looked through the code and ran it E2E (with updates below) in docker. You did a good job getting all the remote worker parts together! Some updates I sent in a PR to your branch, feel free to merge or use as a guide to adjust your branch:
Please update to add:
Questions:
If you want to try out, my docker builds are e2b0f489.mp4 |
@@ -257,3 +259,5 @@ require ( | |||
lukechampine.com/blake3 v1.2.1 // indirect | |||
rsc.io/tmplfunc v0.0.3 // indirect | |||
) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to remove before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just give me a heads up before we merge and I will get it removed in the last commit
server/ai_mediaserver.go
Outdated
@@ -48,6 +48,31 @@ const ( | |||
Complete ImageToVideoStatus = "complete" | |||
) | |||
|
|||
type ObjectDetectionResponseAsync struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these being used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the struct type is being used here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should not add another async result endpoint. I believe we will add a universal endpoint to check for async results at some point down the road rather than have async for some pipelines and not others. @rickstaa what are your thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you that generalisation is better to be done in the case async requests' result endpoints for all the pipelines equally. But right now for this PR should I keep the async functioning? @ad-astra-video What would you suggest?!
cc @rickstaa
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer remove the async part on this PR and we put on the road map a more general way to make all requests async if preferred by the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in the latest commit 👍
Thanks for the PR as it was also a TODO for me to get all frames into one mp4. I have merged those changes already 👍
Can you elaborate a little?! Are there any changes in
Regarding the delta in inference time with respect to the input size, I did some inference runs for files of various sizes on a T4 GPU with google colab. So, there were five files each of size 12.9MB, 21.3Mb, 39.8 Mb, 97.8 Mb & 116.1Mb. One thing to note was also that even if the size of the input video file is little bit more than an another file due to the increased number of frames, still if the resolution is high for the smaller sized file, the inference time will be more for it. Can the initial pricing be done similar to SAM2 :
Hmmm..regarding rendering boxes client-side, I’ll need to explore this further as it’s an area I haven’t deeply worked on yet. Availability of choice is a better option as annotation of the frames can be done with detected output data as well out of the pipeline loop. |
Also @ad-astra-video If you could cross-check on your side as well with an E2E run with the recent commits that were pushed. |
The
I think pricing based on pixels is the most accurate on compute difficulty since it would incentivize users to send in lower resolution samples to process (eg 720p or lower) to get a better price. That said, other services price the inference based on video seconds so that would be easiest for users to convert to using Livepeer network. I am fine with leaving pricing as per pixel for now to be similar to other pipelines. Audio uses input file time length based pricing only because there is no pixels to count.
I was not clear on what I was asking, sorry about that. I was curious about the inference time difference between say 1080p and 360p. Below is the examples of inference time difference at the two resolutions using the same input video. The inference time difference is a little less than 10% faster using 360p but decoding is about 800% faster so should in my opinion cost less to process. 1080p
360p note: annotating the frames adds about 1 second to detections time in this 10 second video
|
I have changed the
I think that overall pricing should get an update for all the pipelines by considering a subtle and not much complex combination of different metric. But for now I am leaving the pricing to be based on pixel similar to other pipelines like you mentioned. 👍
Thank you for the clarification! The data you provided gives quite nice insights. |
@ad-astra-video What are the final changes which I need to make to get this PR ready for merge as most of them are addressed I thiink?! |
…move object detection func to generic aiMediaServerHandle
@RUFFY-369 I put a PR up to remove the remaining async object detection route code and update go.mod and go.sum. This PR is in good shape but the ai-worker PR needs to be completed before merging this PR. There will be some changes needed in this PR from the updates requested in the ai-worker PR for sending the re-encoded video back from the runner but expect them to be relatively minor. |
Hi @ad-astra-video Thanks for the PR, I will take a look and get it merged. |
Remove remaining async object detection code and update go.mod go.sum
@ad-astra-video I have made the requested changes in the |
@RUFFY-369 i put up a PR on your repo for some changes I used to test end to end. Relatively small changes mostly and some changes to incorporate changes in the PR i put up on your ai-worker repo. Can you rebase this to master? Then we can merge! |
chore: updates for object detection pipeline
@ad-astra-video I have merged your PR. Thanks! |
What does this pull request do? Explain your changes. (required)
Adds support for the
ai-worker
pipeline which implements the (real-time)RT-DETR
object-detection model (default)Corresponding AI-Worker PR: livepeer/ai-worker#243
Specific updates (required)
How did you test each of these updates (required)
Test was performed by running the gateway + worker locally.
Does this pull request close any open issues?
Checklist:
make
runs successfully./test.sh
passcc @rickstaa