Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Templating for Configmaps, Default Values file #647

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ewkoch
Copy link
Collaborator

@ewkoch ewkoch commented Oct 23, 2024

Pull Request Summary

Add templating for configmaps so that they don't have to be built separately. Aiming to comment a default values.yaml file so we can clearly show what needs to be configured for working deployment.

Test Plan and Usage Guide

Need to test/deploy from scratch prior to merge.

@ewkoch ewkoch marked this pull request as draft October 23, 2024 20:30
secretKeyRef:
name: {{ .Values.redis.authSecret }}
key: auth_token
{{- end }}
Copy link
Collaborator Author

@ewkoch ewkoch Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need refactor/replace above, this should come from secret. Also need to verify this is actually used. Vaguely remember seeing that it was unnecessary (because we wind up putting the redis auth token in the fully specified URL in AWS secrets manager instead)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes let's apply this change

@@ -20,6 +20,9 @@ data:
[profile {{ $profileName }}]
role_arn = {{ index $annotations "eks.amazonaws.com/role-arn" }}
web_identity_token_file = /var/run/secrets/eks.amazonaws.com/serviceaccount/token
[profile {{ $.Values.serviceAccount.sqsProfileName }}]
role_arn = {{ index $annotations "eks.amazonaws.com/role-arn" }}
web_identity_token_file = /var/run/secrets/eks.amazonaws.com/serviceaccount/token
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally consolidate this to one profile- need to verify that the SQS profile is assumable / check whether it has to match $profileName

labels:
product: common
team: infra
annotations:
"helm.sh/hook": pre-install
"helm.sh/hook": pre-install,pre-upgrade
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without this you can't update the vllm tags on upgrades

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, internally, we have a separate upgrade process for vllm that doesn't require install helm charts. If you're interested, we can socialize that a bit better

vllm_batch_v2: "latest"
vllm: "{{ .Values.vllmTag }}"
vllm_batch: "{{ .Values.vllmTag }}"
vllm_batch_v2: "{{ .Values.vllmTag }}"
Copy link
Collaborator Author

@ewkoch ewkoch Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice to specify / could probably default to latest if not set

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are different images so we would want vllmTag, vllmBatchTag, vllmBatchV2Tag; and yes let's default to latest

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

{{- range .Values.virtualservice.gateways }}
- {{ . | quote }}
{{- end }}
- {{ $.Values.global.networking.internalGateway }}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im open to scrapping this- this was part of a DNS refactor we did to make specification of virtual services simpler

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious to see what the rest of the dns refactor changes look like

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i made the rest of the refactor- it helps when model engine is a subchart to prevent duplication of changing the core domain a bunch of places

env: {{ .Values.context | quote }}
{{- with .Values.config.values.infra }}
{{- range $key, $value := . }}
{{ $key }}: {{ $value | quote }}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This broke for some reason, I think related to how helm / go templating was handling certain types. I also don't like it because it doesn't clarify what actually needs to be in the config

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo we shouldn't hardcode the values that go into launch_service_config and infra_service_config; doing so introduces a coupling between the helm chart and the model engine code, e.g. if the values that need to go in either of the two configs changes. Probably good to see what ended up breaking though and fixing that

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would prefer a single source of truth for configuring the deployment of a new instance- manually changing the yaml files is also a coupling between config and code, and similarly error prone-

i think there's probably a bigger refactor necessary either way to reduce the complexity of interacting with those configs

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can pull your changes to service/infra configs to a separate file for now for documentation purpose as we figure out a proper way to do this

"infra.scale.com/team": "${team}",
"infra.scale.com/contact": "{{ .Values.contactEmail }}",
"infra.scale.com/customer": "AllCustomers",
"infra.scale.com/financialOwner": "{{ .Values.contactEmail}}",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to template "infra.scale.com" to something or allow these as additional tags set in the values.yaml

# {{- if empty $node_selector }}
# nodeSelector:
# {{- end }}
# k8s.amazonaws.com/accelerator: ${GPU_TYPE}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will re-add this, just serves as a note that need to add documentation that this tag must be set on nodegroups

tolerations:
- key: "nvidia.com/gpu"
operator: "Exists"
effect: "NoSchedule"
- key: "gpu_a100_multi"
operator: "Exists"
effect: "NoSchedule"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hack, need to delete

@@ -522,6 +525,7 @@ data:
loadBalancer:
simple: LEAST_REQUEST
{{- end }}
{{- if and (.Capabilities.APIVersions.Has "autoscaling.k8s.io/v1") (.Values.autoscaling.vertical.enabled) }}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is important if you don't have autoscaling

node-lifecycle: normal
nodeSelector:
node-lifecycle: normal
# balloonNodeSelector:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note- need to build out a good default values.yaml with comments specifying the purpose of each configuration option.

@@ -1,25 +0,0 @@
{{- if and (.Values.serviceTemplate) (.Values.serviceTemplate.createServiceAccount) (.Values.serviceTemplate.serviceAccountAnnotations) (.Values.serviceTemplate.serviceAccountName) (.Values.config.values.launch.endpoint_namespace)}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason we're removing this? seems like we could just add some more conditions to the if to achieve the same effect

Copy link
Collaborator Author

@ewkoch ewkoch Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea i'm fixing this- didn't realize this "createServiceAccount" was just for the inference account, which i was getting a naming conflict with so i removed for a quick fix

just updating with explicit naming, since we create another model-engine service account by default

@ewkoch ewkoch marked this pull request as ready for review November 1, 2024 18:58
@ewkoch ewkoch changed the title [WIP] Add Templating for Configmaps Add Templating for Configmaps, Default Values file Nov 1, 2024
@@ -1,24 +1,20 @@
{{- if .Values.virtualservice.enabled -}}
{{- if .values.virtualService.enabled -}}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was Values => values intentional?

port: 80

# Creates istio virtual services for the Model Engine using the global domain nome and gateway specified below
virtualService:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this is a backwards incompatible change; can we leave this as virtualservice

enabled: false
redis:
auth:
# If specified, will override the name of the deployed services
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comprehensive documentation here. It's very much appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants