Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find out what usage data Test, Albany, Barcelona, & Cairo Clusters gather and find out how to funnel this information to Invoicing #880

Open
6 tasks
joachimweyl opened this issue Jan 9, 2025 · 8 comments
Labels
enhancement New feature or request observability openshift This issue pertains to NERC OpenShift question Further information is requested

Comments

@joachimweyl
Copy link
Contributor

joachimweyl commented Jan 9, 2025

Motivation

If possible we would like to track usage on these 4 clusters and future OpenShift clusters we spin up as we do for Production.

Completion Criteria

Confirm what data can be gathered from OpenShift clusters that are not Production. Find out what will need to be done to gather the same level of data as Production. find out how much work it will be to include these clusters when we pull invoicing data

Description

  • What data is currently gathered?
  • Track down what would need to be done to gather the same level as Production
    • Create a checklist of what is required to ensure new OpenShift clusters are tracked
    • Create a new issue to Automate as much of this process as possible so all new OpenShift clusters start tracking right away
  • Track down what would need to be updated in the invoicing script to pull this data
    • Make sure that the code changes allow for automatic/easy additions of new OpenShift clusters

Completion dates

Desired - 2025-02-05
Required - TBD

@joachimweyl joachimweyl added enhancement New feature or request observability openshift This issue pertains to NERC OpenShift question Further information is requested labels Jan 9, 2025
@joachimweyl
Copy link
Contributor Author

@schwesig @computate @dystewart or @tssala23 do you have access to any of the answers to any of the questions above?

@tssala23
Copy link

tssala23 commented Jan 9, 2025

@joachimweyl How exactly is it done for prod? Is it just recording usage per namespace?
For Albany Barcelona and Cairo, I though they were just being charged per the individual baremetal machines used to make the cluster. Getting usage for those cluster would be redundant no? Once they really ever only have one person/group using them.

@joachimweyl
Copy link
Contributor Author

@tssala23 currently yes, they are charged per hour of lease but we would like to change that for the test servers since some of them have A100 GPUs. We would rather not pay for hours of GPU usage when we are not actually using the GPUs.
It would also be good to know how much usage they are getting in general so we know if we really need 3 FC830s as workers, etc.
More data is always better :)

@tssala23
Copy link

tssala23 commented Jan 9, 2025

Okay, well if the data is collected through the obs cluster it should be easy to have them all connected. Test and Albany should already show up

@schwesig
Copy link
Member

schwesig commented Jan 9, 2025

little note:
albany is having (at least until yesterday, when I was checking with @IsaiahStapleton), some collecting issues (not showing up, we tested the llm_ metrics).
There seems to be a cluster operator issue.
OCP-on-NERC/nerc-ocp-config#632 (comment)

@schwesig
Copy link
Member

schwesig commented Jan 9, 2025

but in general:
every cluster connected to/managed by ACM/infra can be observed, and data can be collected, as Naved is doing with his queries/scripts for prod.

@tssala23
Copy link

tssala23 commented Jan 9, 2025

@schwesig There was an issue with a couple of the COs I managed to solve some but ODF is now stuck upgrading meaning that storage isn't currently working. The error seems like a bug, so I have opened a support ticket. However, it looks like a crude solution would be to delete the ceph cluster resource and recreate it but wait to see what support says.

@tssala23
Copy link

tssala23 commented Jan 9, 2025

FYI the only reason I didn't just add all the cluster when they were created is because of the issue with deleted clusters still showing up on ACM. Since these clusters were somewhat temporary, I did not want to clutter that list. However, the issue with the deleted clusters maybe be due to us running an older version of ACM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request observability openshift This issue pertains to NERC OpenShift question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants