Feat/add gpu vram #780

garloff · 2024-10-13T15:11:08Z

Preliminary:

Spec adjustment done
Implementation notes done (including GPU table)
Code adjustments (parser, flavor generator, pretty name, ...) NOT yet done

Please check whether that incremental change is the small step that we agreed upon in the last team meeting.
It keeps the current scheme intact -- it's certification preserving, thus no new version required.

We may reconsider the flavor naming more fundamentally at some point in the future and maybe change direction from the systematic naming, but this is a major deviation and needs more work and preparation.

Related to #366

To be done: Adjust flavor name parser, pretty printer, generator. Signed-off-by: Kurt Garloff <[email protected]>

Signed-off-by: Kurt Garloff <[email protected]>

cah-patrickthiem · 2024-10-14T07:12:41Z

Looks good overall.
But I have got some things to mention anyway:

Minor thing, Nvidia is mostly written either with all capital letters NVIDIA or only the "N" is capital: Nvidia (e.g. https://de.wikipedia.org/wiki/Nvidia). I basically never see nVidia (funny thing is, Github marked this writing as false, Nvidia and NVIDIA not). So my suggestion would be to change the writing to "Nvidia" for all mentions.
keep in mind that there are still many Nvidia generations officially supported by Nvidia: see https://endoflife.date/nvidia-gpu . I would suggest to include this link in the document and to also include all GPUs from Nvidia which are at least listed with "Active Support", meaning we go down to Maxwell generation. We at C&H do have at least Turing (Nvidia T4) GPUs which are used by customers. Turing is even listed as "In Production" For AMD and Intel this does not apply since they have not that many generations out there for so long as Nvidia.

mbuechse · 2024-10-14T08:07:41Z

Interesting. I still know the old spelling nVidia from the 90s as well. It can be seen on old chips, for instance: https://de.m.wikipedia.org/wiki/Nvidia#/media/Datei%3A6600GT_GPU.jpg

Also mention older generations ... Signed-off-by: Kurt Garloff <[email protected]>

garloff · 2024-10-14T10:59:42Z

Spelling fixed.
Feel free to add further nVidia^WNvidia generations with tables.
The scheme allows for them, no mysteries ...

cah-patrickthiem · 2024-10-15T05:48:33Z

Spelling fixed. Feel free to add further nVidia^WNvidia generations with tables. The scheme allows for them, no mysteries ...

Lets merge this PR here first and I will create another PR to fill up the table with more GPUs.

cah-patrickthiem

Looks good for me, LGTM.

cah-patrickthiem · 2024-10-15T07:24:39Z

For detailed information regarding why this PR was made: #546

Signed-off-by: Kurt Garloff <[email protected]>

One true fix (broken link) And tweak the double-space test for tolerating two spaces after a | in a table. Signed-off-by: Kurt Garloff <[email protected]>

Signed-off-by: Kurt Garloff <[email protected]>

garloff · 2024-10-15T11:22:48Z

Well, I still need to adjust the parser, generator, pretty-printer in the code to accept/generate the new syntax before merging.
I'm glad to have support before investing the work.

Signed-off-by: Matthias Büchse <[email protected]>

Standards/scs-0100-v3-flavor-naming.md

Standards/scs-0100-w1-flavor-naming-implementation-testing.md

Tests/iaas/flavor-naming/flavor_names.py

cah-patrickthiem

After Matthias commits, I again went through all documents and found some more things which should be changed.

mbuechse · 2024-10-16T15:12:04Z

@cah-patrickthiem Please perform the changes yourself. Kurt is on vacation, and he won't mind.

Just for a bit better readability. Signed-off-by: Kurt Garloff <[email protected]>

h on the SMs/CUs/EUs: High frequency h on the VRAM: High bandwidth Again, this is really only to differentiate if a vendor has several otherwise similar models that have a material difference in frequency or bandwidth, such as e.g. a GDDR6 vs an HBM2e veriant ... or a low-power, low-frequency variant. Signed-off-by: Kurt Garloff <[email protected]>

garloff · 2024-10-17T11:50:30Z

After Matthias commits, I again went through all documents and found some more things which should be changed.

You're speaking in mysteries, @cah-patrickthiem here.

If you have minor improvements, just mention and commit them, like @mbuechse suggests.
If you have major concerns, please bring them up and let's discuss.

I did change the wording in the meaning of h to say high frequency (rather than perf) for the CUs and high-bandwidth (rather than perf) for VRAM, so things should be clearer in the console output and on https://flavors.scs.community/
I hope that is appreciated.

From my perspective this is ready to be merged, but happy to see additional refinement if you have any ...

cah-patrickthiem · 2024-10-17T13:24:03Z

After Matthias commits, I again went through all documents and found some more things which should be changed.

You're speaking in mysteries, @cah-patrickthiem here.

If you have minor improvements, just mention and commit them, like @mbuechse suggests.

If you have major concerns, please bring them up and let's discuss.

I did change the wording in the meaning of h to say high frequency (rather than perf) for the CUs and high-bandwidth (rather than perf) for VRAM, so things should be clearer in the console output and on https://flavors.scs.community/ I hope that is appreciated.

From my perspective this is ready to be merged, but happy to see additional refinement if you have any ...

Sorry, usually I/we handle it like so, that the reviewer is not supposed to push anything to the PR to be reviewed, therefore I did not do this but planned to do it after Matthias told me its ok. You were just quicker.

Nevertheless, I also think we can merge it, yes.

garloff · 2024-10-18T02:30:01Z

Thanks, @cah-patrickthiem and @mbuechse to get this over the finish line!

garloff added 2 commits October 13, 2024 16:40

Add GPU table and VRAM into specification.

822cb86

To be done: Adjust flavor name parser, pretty printer, generator. Signed-off-by: Kurt Garloff <[email protected]>

Version 3.2, adjust examples.

2115c59

Signed-off-by: Kurt Garloff <[email protected]>

garloff added work in progress Pull requests that are work in progress, do not merge them standards Issues / ADR / pull requests relevant for standardization & certification labels Oct 13, 2024

garloff requested review from mbuechse and cah-patrickthiem October 13, 2024 15:11

garloff self-assigned this Oct 13, 2024

garloff added 3 commits October 13, 2024 17:27

Fix empty line and extra space.

8b5a0aa

Signed-off-by: Kurt Garloff <[email protected]>

Minor addition of information for AMD and intel.

1fe9863

Signed-off-by: Kurt Garloff <[email protected]>

Typo.

930e31d

Signed-off-by: Kurt Garloff <[email protected]>

garloff mentioned this pull request Oct 13, 2024

[Standardization] GPU naming convention needs further refinements #366

Closed

6 tasks

Note about 1/7 uncertainties. Nvidia spelling.

8266e07

Also mention older generations ... Signed-off-by: Kurt Garloff <[email protected]>

cah-patrickthiem approved these changes Oct 15, 2024

View reviewed changes

garloff added 3 commits October 15, 2024 06:52

Appease markdownlint.

3f86082

Signed-off-by: Kurt Garloff <[email protected]>

More appeasement for mardownlint.

c3ecab0

One true fix (broken link) And tweak the double-space test for tolerating two spaces after a | in a table. Signed-off-by: Kurt Garloff <[email protected]>

One more fix against double spaces.

4f539b3

Signed-off-by: Kurt Garloff <[email protected]>

mbuechse added 3 commits October 16, 2024 14:10

add vram and vramperf to GPU (retrofitting v1 and v2)

fbf9fa2

Signed-off-by: Matthias Büchse <[email protected]>

bugfix: use correct variable

f962edf

Signed-off-by: Matthias Büchse <[email protected]>

appease flake8

73a02bc

Signed-off-by: Matthias Büchse <[email protected]>