-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add kubernetes deployment for GenAIComps #1104
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some questions ..
Any reason why no failureThreshold on the readinessProbe for CPU and gaudi values?
Why do we not mention any image for CPU .. is it in some common file. Likewise all the min/max token lengths.
What about model id with Gaudi?
Thank you @yongfengdu for this PR.
|
Here is the link for default values.
|
I was concentrating more on when things actually start working, rather than when they start failing, so it's a rather rough value.
They could be explicitly mentioned also for For Gaudi, I think the readiness probe For CPU it's definitely better to keep it >1 because perf on CPU is so unpredictable (because underlying HW can differ and pods do not have fine-tuned resource requests / limits).
I've never seen OPEA pods deadlock, especially in a way that would be solved by restarting the pod. I.e. liveness probe restarts just make things worse, as service may then never reach ready/live state => IMHO liveness probes are just harmful, and could be removed. |
Why do we have |
It's "deployment/kubernetes" |
e19ecf2
to
353bc46
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for dataprep and retriever, the current value files will be obsolete in a couple of days.(new PR in infra repo will be submitted next week). Should we remove them from this PR?
I've verified the current *-values.yaml files works fine with latest released version (1.1), so I think it's reasonable to merge this change first, and do a follow up fix after helm charts changed.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's land-in this first to add the following CI tasks. Will update relevant values files later after GenAIInfra's helm chart is refactored for v1.2
After the 0-latest on push event ready, we can open the CI for this Repo. |
Signed-off-by: Dolpher Du <[email protected]>
for more information, see https://pre-commit.ci
Description
The summary of the proposed changes as long as the relevant motivation and context.
Issues
List the issue or RFC link this PR is working on. If there is no such link, please mark it as
n/a
.Type of change
List the type of change like below. Please delete options that are not relevant.
Dependencies
List the newly introduced 3rd party dependency if exists.
Tests
Describe the tests that you ran to verify your changes.