Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add llm_performance_* to metrics allowlist #632

Merged
merged 1 commit into from
Jan 6, 2025

Conversation

IsaiahStapleton
Copy link
Contributor

@IsaiahStapleton IsaiahStapleton commented Jan 6, 2025

This PR will allow prometheus to gather the metrics from the llm-load-test-exporter application, which gathers performance metrics for ai models running in a cluster: https://github.com/IsaiahStapleton/llm-load-test-exporter.

Configures metrics related to model performance to be gathered by
prometheus.

Signed-off-by: Isaiah Stapleton <[email protected]>
@IsaiahStapleton IsaiahStapleton merged commit ffe88a5 into OCP-on-NERC:main Jan 6, 2025
2 checks passed
@computate
Copy link
Member

@IsaiahStapleton which cluster do you expect to see these llm_performance_* metrics coming from in ACM?

@IsaiahStapleton
Copy link
Contributor Author

@computate From the Albany cluster, it should be connected to observability now: https://console-openshift-console.apps.nerc-ocp-infra.rc.fas.harvard.edu/multicloud/infrastructure/clusters/managed

@computate
Copy link
Member

computate commented Jan 7, 2025

@IsaiahStapleton Looks like there are some issues with the cluster operators on the albany cluster preventing ACM Observability from being configured there.

CO [monitoring](https://console-openshift-console.apps.albany.nerc.mghpcc.org/k8s/cluster/config.openshift.io~v1~ClusterOperator/monitoring) Cannot update 4.17.5 configuration in the "openshift-monitoring/cluster-monitoring-config" ConfigMap is invalid and should be fixed: error unmarshaling JSON: while decoding JSON: json: unknown field "grafana"

CO [operator-lifecycle-manager](https://console-openshift-console.apps.albany.nerc.mghpcc.org/k8s/cluster/config.openshift.io~v1~ClusterOperator/operator-lifecycle-manager) Cannot update 4.17.5 ClusterServiceVersions blocking cluster upgrade: openshift-logging/cluster-logging.v5.8.15 is incompatible with OpenShift minor versions greater than 4.17,openshift-storage/odf-operator.v4.16.4-rhodf is incompatible with OpenShift minor versions greater than 4.17

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants