Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No documentation on which GPUs this driver supports #18

Closed
RobertCochran opened this issue May 11, 2022 · 31 comments
Closed

No documentation on which GPUs this driver supports #18

RobertCochran opened this issue May 11, 2022 · 31 comments
Assignees
Labels
enhancement New feature or request NV-Triaged An NVBug has been created for dev to investigate

Comments

@RobertCochran
Copy link

There is no bleedingly obvious list of what GPU models this driver does and does not support. It's of course safe to assume that this driver supports all of the latest GPUs and at least a generation back, but beyond that it's difficult to ascertain whether you're in the green here. I know that the proprietary driver fairly regularly drops support for older cards. What options are available for folks that have said older cards?

Please provide a clear list of GPUs are that supported by this driver, thanks.

@Aang23
Copy link

Aang23 commented May 11, 2022

It's version 515, so it will support what the equivalent proprietary driver supports. Architecture-wise, that includes :
Maxwell, Pascal, Turing, Ampere, Volta.

Personally though, I wouldn't mind if the code for Kepler could be released as well.

@Binary-Eater
Copy link

My personal understanding is that it essentially supports any card that is able to use the GSP firmware. That information seems to be available in the documentation provided with each NVIDIA driver release.

https://download.nvidia.com/XFree86/Linux-x86_64/515.43.04/README/gsp.html

@sempervictus
Copy link

Based on the source code here and in the last binary driver set, it looks to "just be the next version" of the same driver, so the same chipsets.

@9p4
Copy link

9p4 commented May 11, 2022

This is limited to only GSP-enabled cards, so 20xx and 30xx cards (Turing and above).

From https://www.phoronix.com/scan.php?page=article&item=nvidia-open-kernel&num=1

Well, it wouldn't entirely since let's not forget this new kernel driver only works with Turing GPUs and newer. Meanwhile the older sweet spot currently for the Nouveau kernel driver is GTX 600/700 Kepler series (and GTX 750 Maxwell1) but the newer GPUs is where it lacks proper power management firmware / re-clocking and thus the performance is in bad shape. Nouveau on recent generations of NVIDIA GPUs is basically only good enough for driving a display as due to the ability to re-clock to the peak performance state its 3D graphics performance is frankly junk. So the NVIDIA Open Kernel Driver is certainly superior for GeForce RTX 20/30 series while GTX 900 / GTX 10 graphics cards will likely be left in an awkward state outside of the proprietary driver stack.

@Aang23
Copy link

Aang23 commented May 11, 2022

This is limited to only GSP-enabled cards, so 20xx and 30xx cards (Turing and above).

From https://www.phoronix.com/scan.php?page=article&item=nvidia-open-kernel&num=1

Well, it wouldn't entirely since let's not forget this new kernel driver only works with Turing GPUs and newer. Meanwhile the older sweet spot currently for the Nouveau kernel driver is GTX 600/700 Kepler series (and GTX 750 Maxwell1) but the newer GPUs is where it lacks proper power management firmware / re-clocking and thus the performance is in bad shape. Nouveau on recent generations of NVIDIA GPUs is basically only good enough for driving a display as due to the ability to re-clock to the peak performance state its 3D graphics performance is frankly junk. So the NVIDIA Open Kernel Driver is certainly superior for GeForce RTX 20/30 series while GTX 900 / GTX 10 graphics cards will likely be left in an awkward state outside of the proprietary driver stack.

Interesting. I do see some files for Maxwell / Pascal in there though. Was just about to test it on my 1070.

@Binary-Eater
Copy link

This is limited to only GSP-enabled cards, so 20xx and 30xx cards (Turing and above).
From https://www.phoronix.com/scan.php?page=article&item=nvidia-open-kernel&num=1

Well, it wouldn't entirely since let's not forget this new kernel driver only works with Turing GPUs and newer. Meanwhile the older sweet spot currently for the Nouveau kernel driver is GTX 600/700 Kepler series (and GTX 750 Maxwell1) but the newer GPUs is where it lacks proper power management firmware / re-clocking and thus the performance is in bad shape. Nouveau on recent generations of NVIDIA GPUs is basically only good enough for driving a display as due to the ability to re-clock to the peak performance state its 3D graphics performance is frankly junk. So the NVIDIA Open Kernel Driver is certainly superior for GeForce RTX 20/30 series while GTX 900 / GTX 10 graphics cards will likely be left in an awkward state outside of the proprietary driver stack.

Interesting. I do see some files for Maxwell / Pascal in there though. Was just about to test it on my 1070.

This is from the NVIDIA documentation.

Turing and later GPUs are capable of using the GSP firmware by setting the kernel module parameter NVreg_EnableGpuFirmware=1. However, note feature and GPU support limitations below.

@PAR2020
Copy link
Contributor

PAR2020 commented May 11, 2022

Hopefully this helps clarify a response, RobertCochran, from the announcement links: https://developer.nvidia.com/blog/nvidia-releases-open-source-gpu-kernel-modules/
Which GPUs are supported by Open GPU Kernel Modules?
Open kernel modules support all Ampere and Turing GPUs. Datacenter GPUs are supported for production, and support for GeForce and Workstation GPUs is alpha quality. Please refer to the Datacenter, NVIDIA RTX, and GeForce product tables for more details (Turing and above have compute capability of 7.5 or greater).

@RobertCochran
Copy link
Author

Seems clear enough. I'd still like if this were mentioned in the README so that the information stays up to date with the driver and removes the guesswork.

@xnox
Copy link

xnox commented May 11, 2022

Shipping supported-gpus/supported-gpus.json would be nice, cause then distributions who package this module can correctly declare / detect matching GPUs.

Currently supported-gpus.json is shipped in the .run script, but it would be nice to have it here too.

@aritger aritger added the enhancement New feature or request label May 11, 2022
@PAR2020
Copy link
Contributor

PAR2020 commented May 11, 2022

Both good suggestions, thanks.

@aritger
Copy link
Collaborator

aritger commented May 11, 2022

Thanks for the discussion. Yes, the open kernel modules support Turing and later GPUs. For datacenter GPUs, that list is currently: https://download.nvidia.com/XFree86/Linux-x86_64/515.43.04/README/gsp.html

For GeForce and Workstation, product names can make it difficult to infer what GPU architecture is in a product.

Extending supported-gpus/supported-gpus.json with an identifier, and including it in this github repository, is a good idea.

Tagging as an enhancement request.

@ajstrongdev
Copy link

If I was a beginner I would not personally know where to look to see whether my GTX 760 is supported or not (last supported Linux driver was 470) - I recommend a simple markdown file with a list of supported and unsupported GPUs

@iam0day
Copy link

iam0day commented May 11, 2022

I agree, I'd like to see an updated list with all officially supported graphics cards.

@ghtesting2020
Copy link

ghtesting2020 commented May 12, 2022

This is limited to only GSP-enabled cards, so 20xx and 30xx cards (Turing and above).

It looks like the 16xx are also supported unless @aritger wants to correct me.

https://en.wikipedia.org/wiki/GeForce_16_series

"The GeForce 16 series is based on the same Turing architecture used in the GeForce 20 series, omitting the Tensor (AI) and RT (ray tracing) cores exclusive to the 20 series"

@mtijanic
Copy link
Collaborator

Yes, the 16xx GeForce series are based on Turing and have a GSP; as such they can use the open source driver.

However, GeForce 16xx are not datacenter GPUs and support for them is experimental in this release, so you will have to use NVreg_OpenRmEnableUnsupportedGpus module param for now.

I have personally found this wikipedia page to be an excellent mapping between product names and chip/architecture ones:
https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units#GeForce_16_series

Anything with a codename of TUxxx or GAxxx should have a GSP microcontroller.

@jthoward64
Copy link

Does anyone think there is a chance of expanding the list to include 10 series or older cards? Whether through a fork or in this repository?

@mtijanic
Copy link
Collaborator

@tajetaje Please see here, cards prior to 16xx and 20xx simply do not have the necessary HW to use this driver.

@jthoward64
Copy link

@tajetaje Please see here, cards prior to 16xx and 20xx simply do not have the necessary HW to use this driver.

Ah, missed that; well hopefully some of the back-ported changes help out then

@MartinX3
Copy link

MartinX3 commented May 12, 2022

@tajetaje Please see here, cards prior to 16xx and 20xx simply do not have the necessary HW to use this driver.

Ah, missed that; well hopefully some of the back-ported changes help out then

They won't backport the driver to older GPU's without the GSP.
Only cards are supported where the important code remains closed source thanks to firmware blob support.
https://download.nvidia.com/XFree86/Linux-x86_64/515.43.04/README/gsp.html

Nvidia just released this "open source" skeleton full of blob usage to workaround the kernel restriction.
https://www.phoronix.com/scan.php?page=news_item&px=Linux-Kernel-Blocking-NV-NetGPU

So

  1. Nvidia tries to workaround the GPL restriction
  2. Nvidia tries to advertise us this PR action as the holy "nvidia supports open source and linux" grail.

@smxi
Copy link

smxi commented May 13, 2022

This is a useful discussion. Note that I'm working on adding support for some of these questions into inxi at the moment, but still am collecting data. So far there is no online resource that combines nvidia product IDs with microarchitecture names, so I have to combine those lists manually and use some guesswork to match the legacy microarchitecture names to product ids. Some of this is running in pinxi, the development branch of inxi, already, and more will be added as I get more information and data, and figure out how to present this data in the most useful form.

I should have something basic working today or in the coming next days, but it won't be perfect due to lack of accurate data online. I've done this type of data assembly many times, any intern could have that page of matching tables up in about 2 days, maybe 3 on the outside (a few hours is there is internal data source for this), but I would only do that if I were paid because it's so boring, I'm looking for more general matches.

It sounds like all cards with product ids listed in the 5.15 page will be supported, I'll have to double check their microarchitectures to confirm that, though that's a real pain due to lack of one single reference resource for this information.

I also used for this support feature:
https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units
but it's lacking in product IDs, which makes machine matching not possible, it's good to confirm lists of product names/ids against that, but it's missing the critical product ids. The nvidia driver support pages have the names and the product ids, per legacy/current driver series, but do not have the architecture names. This is a common situation (same thing happens with cpu microarchitecture names, vendors don't release this information so it has to be assembled from other sources).

@smxi
Copy link

smxi commented May 15, 2022

This is my raw working code with Product IDs. Needless to say, it would be a lot easier to get this data if per microarchitecture product IDs > microarch tables or whatever were available.

These are based on the data in the 515 driver readme. Because I have to use manually generated regex to get each microarch using the data on the nvidia wikipedia pages, I probably missed some product IDs. I honestly scratch my head and wonder why multibillion dollar tech companies can't publish such information in a clear format, it would cost them probably a few hundred dollars max to do it. But nvidia publishes better than most, so there is that.

I'm debating adding latest supported driver, but given inxi tends to get stuck in frozen pool distros, the data will get out of date fairly quickly for everything except official legacy releases. And the product ID matches will not include newer card IDs, but it's an ok start, it will cover most users for now.

The legacy driver data is generated in a different way for now.

This data and code is running live now in pinxi, and creates output like this:

pinxi --nvidia
Device-1: NVIDIA GP106 [GeForce GTX 1060 6GB] vendor: Gigabyte 
  driver: nvidia v: 510.68.02 alternate: nvidiafb,nouveau,nvidia_drm 
  non-free: current (as of 2022-05) arch: Pascal process: TSMC 16nm 
  pcie: gen: 2 speed: 5 GT/s lanes: 16 link-max: gen: 3 speed: 8 GT/s
  bus-ID: 09:00.0 chip-ID: 10de:1c03 class-ID: 0300
  ....

Here's the product/pci IDs, so far. I probably missed a few in the regex parsing.

# This is all the IDs matched from the 515 driver html table:

'Maxwell' => {
'ids' => '1340|1341|1344|1346|1347|1348|1349|134b|134d|134e|134f|137a|137b|' .
'1380|1381|1382|1390|1391|1392|1393|1398|1399|139a|139b|139c|139d|13b0|13b1|' .
'13b2|13b3|13b4|13b6|13b9|13ba|13bb|13bc|13c0|13c2|13d7|13d8|13d9|13da|13f0|' .
'13f1|13f2|13f3|13f8|13f9|13fa|13fb|1401|1402|1406|1407|1427|1430|1431|1436|' .
'1617|1618|1619|161a|1667|174d|174e|179c|17c8|17f0|17f1|17fd|1c8c|1c8d|1c90|' .
'1c91|1d10|1d12|1e91|1ed1|1ed3|1f14|1f54',

'Pascal' => {
'ids' => '15f0|15f7|15f8|15f9|17c2|1b00|1b02|1b06|1b30|1b38|1b80|1b81|1b82|' .
'1b83|1b84|1b87|1ba0|1ba1|1ba2|1bb0|1bb1|1bb4|1bb5|1bb6|1bb7|1bb8|1bb9|1bbb|' .
'1bc7|1be0|1be1|1c02|1c03|1c04|1c06|1c07|1c09|1c20|1c21|1c22|1c23|1c30|1c31|' .
'1c60|1c61|1c62|1c81|1c82|1c83|1c8c|1c8d|1c8f|1c90|1c91|1c92|1c94|1c96|1cb1|' .
'1cb2|1cb3|1cb6|1cba|1cbb|1cbc|1cbd|1cfa|1cfb|1d01|1d02|1d11|1d13|1d16|1d33|' .
'1d34|1d52',

'Volta' => {
'ids' => '1d81|1db1|1db3|1db4|1db5|1db6|1db7|1db8|1dba|1df0|1df2|1df6|1fb0|' .
'20b0|20b3|20b6',

'Turing' => {
'ids' => '1e02|1e04|1e07|1e09|1e30|1e36|1e78|1e81|1e82|1e84|1e87|1e89|1e90|' .
'1e91|1e93|1eb0|1eb1|1eb5|1eb6|1ec2|1ec7|1ed0|1ed1|1ed3|1ef5|1f02|1f03|1f06|' .
'1f07|1f08|1f0a|1f0b|1f10|1f11|1f12|1f14|1f15|1f36|1f42|1f47|1f50|1f51|1f54|' .
'1f55|1f76|1f82|1f91|1f95|1f96|1f97|1f98|1f99|1f9c|1f9d|1f9f|1fa0|1fb0|1fb1|' .
'1fb2|1fb6|1fb7|1fb8|1fb9|1fba|1fbb|1fbc|1fdd|1ff0|1ff2|1ff9|2182|2184|2187|' .
'2188|2189|2191|2192|21c4|21d1|25a6|25a7|25a9|25aa',

'Ampere' => {
'ids' => '20b0|20b2|20b5|20b7|20f1|2203|2204|2206|2208|220a|220d|2216|2230|' .
'2231|2232|2233|2235|2236|2237|2238|2414|2420|2438|2460|2482|2484|2486|2487|' .
'2488|2489|248a|249c|249d|24a0|24b0|24b1|24b6|24b7|24b8|24b9|24ba|24bb|24dc|' .
'24dd|24e0|24fa|2503|2504|2507|2508|2520|2523|2531|2560|2563|2571|25a0|25a2|' .
'25a5|25b6|25b8|25b9|25ba|25bb|25e0|25e2|25e5|25f9|25fa',

[updated 2022-05-15 with parser tool]

If I missed a pci/product ID in any microarch let me know the product name string so I can add it to the regex pattern, then update my list of product IDs.

@mtijanic
Copy link
Collaborator

We detect architectures at runtime by reading the PMC_BOOT_0 and comparing the architecture bits with these values (there's also individual implementation constants below).

We currently don't have a handy offline reference table to translate device IDs to architectures, but it certainly sounds like a good thing to publish. We do have a list of all published products and their IDs, from which such a list could be constructed: g_nv_name_released.h

@smxi
Copy link

smxi commented May 15, 2022

That method would be preferred, but this is Perl, so unless it's available in /sys or something like that I couldn't use it, but a real dynamic detection is obviously far superior to a product/pci id detection, which will always miss stuff, and be more difficult to update.

I'll poke around in some nvidia user datasets and see if those architecture bits are exposed anywhere that is readable.

Given how important knowing your card microarchitecture is beccming, simply adding one more column to your otherwise exceptionally well done product name/id files created with each new driver would really be all it takes, those are relatively easy to parse, that's what I'm using, but it's very difficult combining the wikipedia data directly in with that data because many product names and series are actually different architectures depending on various situations. The wikipedia pages don't have product ids, and are also vastly varying in quality and completeness depending on microarchitecture, they were clearly generated by different people with different ideas of what constitutes consistent data, so not ideal.

I'm going to convert my shell filters to perl today to make it easier to build up these lists, until ideally they get published in a more reliable format for 71 > 470 legacy and 515 > product id lists.

Note that is quite literally just adding one more table column to the published html tables, which should not take more than an hour or two to do, it only took me as an outsider having to merge data and no access to internal resources a few hours.

Note that one very strange thing in the wikipedia data is that a page for say kepler will show the product code for fermi or something, a few mixed in, at which point it's hard to know if they made a mistake, or some codes are not actually referring to the microarch names even though they are clearly meant to like TU, GV, GA, and so on. I haven't found any docs for Tegra anywhere so far unless I'm missing it, I see by the header file that is a different category altogether, but one thing at a time.

For now, I'll whip up a perl script to make this more fool proof and repeatable to generate the per arch. id lists, which will make it easy if in the future those arch. are added to the existing html tables, or json data sources, or whatever.

I believe these changes will already cover most nvidia users in Linux however, it's just labor intensive to maintain it over time, not ideal.

@smxi
Copy link

smxi commented May 15, 2022

I've updated the IDs above if they are useful to anyone, those will be going live fairly soon, just need to work out a few last details. Made a Perl tool to do the parsing, that works much better, and is easier to adjust and tweak.

This output is designed to copy/paste directly into inxi data structures, but should be easy for anyone to use / modify, at least until something better shows up.

I'm adding in Fahrenheit > Ampere microarchitectures, though I believe the Tesla IDs are going be off, maybe also the Curie ones, but those are not well documented and are confusing to figure out. I might use the new tool at least to lock down Fermi/Kepler IDs as well, I'll see.

@mtijanic mtijanic added the NV-Triaged An NVBug has been created for dev to investigate label May 17, 2022
@aritger aritger self-assigned this May 18, 2022
@aritger
Copy link
Collaborator

aritger commented May 18, 2022

A subsequent release will include a list of compatible GPUs in the README.md. Self assigning.

@smxi
Copy link

smxi commented May 19, 2022

Make sure any such list is at some level machine parseable, otheriwise it just moves the problem a bit. Tab separated columns, for instance, not space. That's the only reason your driver release supported products info html pages are usable as data sources.

@aritger
Copy link
Collaborator

aritger commented Jun 1, 2022

@smxi, for a machine-parseable format, please see the 'kernelopen' token in supported-gpus/supported-gpus.json of NVIDIA-Linux-x86_64-515.48.07.run and later.

I'm going to mark this Issue fixed:

  • In the 515.48.07 release of open-gpu-kernel-modules, the README.md contains a table for human readers.
  • In the 515.48.07 release of NVIDIA-Linux-x86_64-515.48.07.run, supported-gpus.json contains the 'kernelopen' token for computer readers.

@aritger aritger closed this as completed Jun 1, 2022
@smxi
Copy link

smxi commented Jun 1, 2022

Thanks for that one, your current HTML tables are actually fine to parse, I'm done with all the backend stuff to make updates fairly trivial in the future. But I'll check that one out, though it requires downloading and extracting the driver package, then grabbing the file, which is actually a lot more work than the method I'm using, but I'll take a look at it, thanks. All the tools are now released as free software, in the inxi-perl branch of inxi, in the tools directory.

Note that inxi is out now and is showing nvidia gpu architectures, and next inxi , which ships in a day or two, unless a bug shows up, will even it out, and also show Intel and AMD gpu architecture data.

Most of the work involved with this stuff is actually more in the research area than the data sources, your HTML tables have always been useful to me in terms of updating support pci ids.

Note that downloading a 344 MiB data file to grab one tiny text file is not very efficient.... and almost 1 TiB extracted, lol...

I'm hoping progress is fast for the free driver codebase, will be interesting to watch, you've always been very fast in the past, guess it will depend on factors, some of which are probably not under your control, like licensed code etc.

@smxi
Copy link

smxi commented Jun 1, 2022

Just on a technical level, speaking as someone who did gnu/linux nvidia installers for far too long, since around 2008, this data is actually needed before the driver run package is downloaded, not after, since it does no good to know your card is legacy after you've downloaded the full latest driver. My stuff always automated all of this of course so it was transparent to the user. But this json source file should help people who are still interested in making installers since it can be fed into the actual installer that determines which run package to download and install the driver with.

I would not have made the json data file this way by the way, it should be similar to your HTML data tables:

"legacybranch":"340.xx" {supported 340.xx devices}, that is, it should be multidimensional, not flat, otherwise it's hard to get the data out of it. Your HTML tables did it well in that regard. Yes, you can loop through it, and rebuild it into a better data structure before using it, but why? I guess it makes sense if you're assuming someone is looping through it to find a pci ID, though that's not very efficient, it's far better to extract the PCI ids per legacy type, and simply match the user's pci id against those until you find a match, then you know which driver to use, for installers. inxi does more than that though, so it's more granular, and is broken down into microarchitectures per legacy, or current, levels, plus the legacy driver info too.

This would be more useful if it were also added as a downloadable resource somewhere, like the way you can click to see the supported products HTML table page if you know how to find it, that same appendix could have a link to this json file as well. I'm sure there are some people who would benefit from this. I doubt I'd download the latest driver every time myself just to get this file, that's kind of overkill.

The kernelopen 'feature' I assume means it's a turing or ampere microarchitecture? I may add a json parser tool to the nvidia logic, but not unless it actually has data the table doesn't have, but this should help some people I think. A 'microarch' field name: value would have been nice since that can be a pain to dig up the data on, but I've done that work mostly so from my side it's largely completed, albeit probably has an error or two in the data, but it's good enough. kernelopen is the one bit of data that the json file contains that the HTML tables don't, unless they've been updated, so it might be worth adding this, I'll see.

Permission is granted to anyone to use this software for any purpose,
including commercial applications, and to alter it and redistribute it
freely, subject to the following restrictions:

  1. The origin of this software must not be misrepresented; you must not
    claim that you wrote the original software. If you use this software
    in a product, an acknowledgment in the product documentation would be
    appreciated but is not required.
  2. Altered source versions must be plainly marked as such, and must not be
    misrepresented as being the original software.
  3. This notice may not be removed or altered from any source distribution.

Kudos for thinking this one through, though having to download that much binary data to get one single file is not ideal, I wouldn't do it myself as a rule, unless the data added value somehow.

The copyright date is out of date, by the way, should be 2022, since that's when this file was released.

@aritger
Copy link
Collaborator

aritger commented Jun 1, 2022

Yes, the "kernelopen" feature indicates that the GPU has GSP and therefore can be driven by the open kernel modules. I.e., is >= Turing.

Note that supported-gpus.json has been present in the .run file for several years; it is only the "kernelopen" feature that is new in 515.48.07.

@smxi
Copy link

smxi commented Jun 1, 2022

Thanks for clarifying. So there's no actual new data in there beyond what I have working already, that's good to know. I don't generally download and look inside the run packages for the drivers, so I would never have found this one, but your HTML tables I think are actually easier to work with as data anyway.

These json files could be a lot more data rich, but in general, nvidia does the best job of releasing this type of practical and useful data, so it's more wishing it were more rich than anything else, the research to find the data that most of the devs inside nvidia probably know off the top of their heads, or have ready access to, is really tedious and time consuming, but again, nvidia does a better job of it than intel or amd, so your overall efforts are appreciated anyway.

Hoping you can make good progress on the desktop gpu gsp open kernel modules, will be curious to see how long it will take to pull them to beta state at least. But given the gigantic size of the current driver it's probably not going to be something we'll see a full stack free driver for anytime soon, if ever, but still one step at a time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request NV-Triaged An NVBug has been created for dev to investigate
Projects
None yet
Development

No branches or pull requests