Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/add gpu vram #780

Merged
merged 15 commits into from
Oct 17, 2024
Merged

Feat/add gpu vram #780

merged 15 commits into from
Oct 17, 2024

Conversation

garloff
Copy link
Member

@garloff garloff commented Oct 13, 2024

Preliminary:

  • Spec adjustment done
  • Implementation notes done (including GPU table)
  • Code adjustments (parser, flavor generator, pretty name, ...) NOT yet done

Please check whether that incremental change is the small step that we agreed upon in the last team meeting.
It keeps the current scheme intact -- it's certification preserving, thus no new version required.

We may reconsider the flavor naming more fundamentally at some point in the future and maybe change direction from the systematic naming, but this is a major deviation and needs more work and preparation.

Related to #366

To be done: Adjust flavor name parser, pretty printer, generator.

Signed-off-by: Kurt Garloff <[email protected]>
Signed-off-by: Kurt Garloff <[email protected]>
@garloff garloff added work in progress Pull requests that are work in progress, do not merge them standards Issues / ADR / pull requests relevant for standardization & certification labels Oct 13, 2024
@garloff garloff self-assigned this Oct 13, 2024
@cah-patrickthiem
Copy link

Looks good overall.
But I have got some things to mention anyway:

  1. Minor thing, Nvidia is mostly written either with all capital letters NVIDIA or only the "N" is capital: Nvidia (e.g. https://de.wikipedia.org/wiki/Nvidia). I basically never see nVidia (funny thing is, Github marked this writing as false, Nvidia and NVIDIA not). So my suggestion would be to change the writing to "Nvidia" for all mentions.

  2. keep in mind that there are still many Nvidia generations officially supported by Nvidia: see https://endoflife.date/nvidia-gpu . I would suggest to include this link in the document and to also include all GPUs from Nvidia which are at least listed with "Active Support", meaning we go down to Maxwell generation. We at C&H do have at least Turing (Nvidia T4) GPUs which are used by customers. Turing is even listed as "In Production" For AMD and Intel this does not apply since they have not that many generations out there for so long as Nvidia.

@mbuechse
Copy link
Contributor

Interesting. I still know the old spelling nVidia from the 90s as well. It can be seen on old chips, for instance: https://de.m.wikipedia.org/wiki/Nvidia#/media/Datei%3A6600GT_GPU.jpg

Also mention older generations ...

Signed-off-by: Kurt Garloff <[email protected]>
@garloff
Copy link
Member Author

garloff commented Oct 14, 2024

Spelling fixed.
Feel free to add further nVidia^WNvidia generations with tables.
The scheme allows for them, no mysteries ...

@cah-patrickthiem
Copy link

Spelling fixed. Feel free to add further nVidia^WNvidia generations with tables. The scheme allows for them, no mysteries ...

Lets merge this PR here first and I will create another PR to fill up the table with more GPUs.

Copy link

@cah-patrickthiem cah-patrickthiem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for me, LGTM.

@cah-patrickthiem
Copy link

For detailed information regarding why this PR was made: #546

Signed-off-by: Kurt Garloff <[email protected]>
One true fix (broken link)
And tweak the double-space test for tolerating two spaces after a | in a
table.

Signed-off-by: Kurt Garloff <[email protected]>
@garloff
Copy link
Member Author

garloff commented Oct 15, 2024

Well, I still need to adjust the parser, generator, pretty-printer in the code to accept/generate the new syntax before merging.
I'm glad to have support before investing the work.

Signed-off-by: Matthias Büchse <[email protected]>
Signed-off-by: Matthias Büchse <[email protected]>
Copy link

@cah-patrickthiem cah-patrickthiem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After Matthias commits, I again went through all documents and found some more things which should be changed.

@mbuechse
Copy link
Contributor

@cah-patrickthiem Please perform the changes yourself. Kurt is on vacation, and he won't mind.

Just for a bit better readability.

Signed-off-by: Kurt Garloff <[email protected]>
h on the SMs/CUs/EUs: High frequency
h on the VRAM: High bandwidth

Again, this is really only to differentiate if a vendor has several
otherwise similar models that have a material difference in frequency
or bandwidth, such as e.g. a GDDR6 vs an HBM2e veriant ...
or a low-power, low-frequency variant.

Signed-off-by: Kurt Garloff <[email protected]>
@garloff
Copy link
Member Author

garloff commented Oct 17, 2024

After Matthias commits, I again went through all documents and found some more things which should be changed.

You're speaking in mysteries, @cah-patrickthiem here.

  • If you have minor improvements, just mention and commit them, like @mbuechse suggests.
  • If you have major concerns, please bring them up and let's discuss.

I did change the wording in the meaning of h to say high frequency (rather than perf) for the CUs and high-bandwidth (rather than perf) for VRAM, so things should be clearer in the console output and on https://flavors.scs.community/
I hope that is appreciated.

From my perspective this is ready to be merged, but happy to see additional refinement if you have any ...

@garloff garloff removed the work in progress Pull requests that are work in progress, do not merge them label Oct 17, 2024
@cah-patrickthiem
Copy link

After Matthias commits, I again went through all documents and found some more things which should be changed.

You're speaking in mysteries, @cah-patrickthiem here.

  • If you have minor improvements, just mention and commit them, like @mbuechse suggests.
  • If you have major concerns, please bring them up and let's discuss.

I did change the wording in the meaning of h to say high frequency (rather than perf) for the CUs and high-bandwidth (rather than perf) for VRAM, so things should be clearer in the console output and on https://flavors.scs.community/ I hope that is appreciated.

From my perspective this is ready to be merged, but happy to see additional refinement if you have any ...

Sorry, usually I/we handle it like so, that the reviewer is not supposed to push anything to the PR to be reviewed, therefore I did not do this but planned to do it after Matthias told me its ok. You were just quicker.

Nevertheless, I also think we can merge it, yes.

@garloff garloff merged commit 1e24ed4 into main Oct 17, 2024
7 checks passed
@garloff garloff deleted the feat/add-gpu-vram branch October 17, 2024 22:31
@garloff
Copy link
Member Author

garloff commented Oct 18, 2024

Thanks, @cah-patrickthiem and @mbuechse to get this over the finish line!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
standards Issues / ADR / pull requests relevant for standardization & certification
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants