Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Livepeer.Cloud SPE - Proposal #2 - Enable Single Orchestrator AI Job Testing Support for Gateway Nodes #3241

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mikezupper
Copy link
Contributor

What does this pull request do? Explain your changes. (required)
Provide features that enables AI Job Testing through gateway nodes. The gateway node has several hard-coded timeouts/cache values that need to be configurable to allow the a gateway to send an AI job to a specific orchestrator for testing.

Specific updates (required)

  • Introduce several new startup flags to enable a gateway node to support AI Job testing.
    • aiTesterGateway - a boolean that enables the gateway node to bypass AI Session caching. Defaults to false to prevent any behavior changes to the default gateway node.
    • aiSessionTimeout - a duration value that allows the AI session timeouts to be configured to a desired value. The default is 600s to match the existing hard-coded value.
    • webhookRefreshInterval - a duration value that allows the orchWebhookUrl cached responses to be configured to a desired value. The default is 60s to match the existing hard-coded value.
    • LIVEPEER_OS_HTTP_TIMEOUT - This is an environment variable (ENV Var). The code is standalone and cannot use common livepeer flags. The variable is a duration value that allows the AI assets (.mp4 files, etc...) download timeout to be configured to a desired value. The default is 4s to match the existing hard-coded value.
  • A new HTTP endpoint was added to fetch all AI capabilities of each orchestrator (/getOrchestratorAICapabilities). This endpoint provides the AI Job Tester with information on all AI models available for the all orchestrators.

How did you test each of these updates (required)
Each of the new flags and timeout values were manually tested in our development environments. They are also deployed to the testing and production Livepeer.Cloud SPE AI Gateway nodes.

Does this pull request close any open issues?
No

Checklist:


// Return the HTTP client with the calculated timeout
return &http.Client{
Transport: &http.Transport{TLSClientConfig: &tls.Config{InsecureSkipVerify: true}},

Check failure

Code scanning / CodeQL

Disabled TLS certificate check High

InsecureSkipVerify should not be used in production code.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mikezupper Why are we skipping TLS here?

@@ -76,7 +103,7 @@
func downloadDataHTTP(ctx context.Context, uri string) ([]byte, error) {
clog.V(common.VERBOSE).Infof(ctx, "Downloading uri=%s", uri)
started := time.Now()
resp, err := httpc.Get(uri)
resp, err := osHttpClient.Get(uri)

Check failure

Code scanning / CodeQL

Uncontrolled data used in network request Critical

The
URL
of this request depends on a
user-provided value
.
@rickstaa
Copy link
Member

@thomshutt this pull request has overlap with #3246 which followed #3052. It can be merged after that one is merged and the pull request is rebased.

@leszko
Copy link
Contributor

leszko commented Jan 2, 2025

@mikezupper @thomshutt what's the plan for this PR? Do we plan to review/merge/productionize it?

I can review and help with that, but I'd like to know what's the plan in the context of the AI Video work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants