-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
switchtec0: failed to register ntb device: -12 on kernel v5.9+ #95
Comments
Hmm, hard to say. Do you have any data points between 4.16 and 5.9? Can you do a bisect? -12 indicates ENOMEM which can happen in a couple different places including being unable to map the BAR. The failed to assign BAR message is concerning, but may not actually be a problem as it is an iterative process. Can you post the output of |
Hi @lsgunth ,
and here the output you requested: $ sudo lspci -v -s 0000:01:00.1
01:00.1 Bridge: PMC-Sierra Inc. Device 4000
Subsystem: PMC-Sierra Inc. Device 4000
Flags: bus master, fast devsel, latency 0
Memory at df400000 (64-bit, prefetchable) [size=4M]
Memory at <ignored> (64-bit, prefetchable)
Capabilities: [40] MSI: Enable- Count=1/4 Maskable- 64bit+
Capabilities: [50] MSI-X: Enable+ Count=4 Masked-
Capabilities: [5c] Power Management version 3
Capabilities: [64] Express Endpoint, MSI 00
Capabilities: [100] Advanced Error Reporting
Capabilities: [148] Multicast
Capabilities: [178] Device Serial Number 50-0e-00-4a-00-00-00-01
Capabilities: [7f8] Vendor Specific Information: ID=ffff Rev=1 Len=808 <?>
Kernel driver in use: switchtec As I mentioned before, if I just reboot the same machine and select Kernel v4.16, without changing any switch configuration, it works smoothly. Do you have in mind any other Kernel version I can use to help you with the bisect? I would like to avoid to have to check every version between 4.16 and 5.4 without any specific clue. |
Just another, I hope useful, comment on this issue. Tracking down the problem, I found out that the driver fails to register because it fails in sndev->peer_shared = pci_iomap(sndev->stdev->pdev, self_bar, LUT_SIZE); Tests done so far:
|
The problem is almost certainly due to the 2nd BAR being unassigned. (You can see it as ignored in your lspci dump). I'm a bit surprised that there's a difference here between the kernel versions, but I guess it's not impossible. There is some code to fix up these bios bugs but it's non-trivial and a bit buggy. It also looks like you only have 32bit PCI addressing which means address space may be limited for a large bar. |
You were absolutely right. BAR2 size was 64M, decreasing it to 16M solved the issue on kernel versions >= 5.4 Thank you very much for your support |
The address of BAR0 was 0xdf400000 which is under 32bits. When 64bit decoding is turned on, the 64bit PCI bars tend to be in a large region well above the 32bit address space. It's very odd that such a small 64MB bar was not assignable. Usually there's more space than that, but this is very likely a bios bug. Turning on 64bit decoding is probably the easiest solution. |
Unfortunately there's no such an option on current BIOS. Maybe we should take into consideration to change motherboards.
On the other hand, if both machines run on v4.16
|
I really find it hard to believe there is no 64bit decoding option. Sometimes they are named weird things. What motherboard are you using? Looks like the bios is really buggy as the BAR was assigned to an unaligned address. That really won't work. You might have to shrink it even further to get it aligned. |
Here you can find my current motherboard's manual: https://download1.gigabyte.com/Files/Manual/mb_manual_ga-z97(h97)-d3h_v1.1_e.pdf The oddest thing that prevents me from sleeping at night is that when both PCs run Linux kernel v4.16 everything runs smoothly and |
Yup, I don't see an option in that manual. Possibly a combination of it being so old and just a consumer motherboard. It's very odd to see the supposedly same machine assign such wildly different PCI address just based on a different kernel version. I know there have been a few minor changes to the kernel code that fixes up addresses assigned by broken bioses. It could have broke your use case; I do know that code is quite fragile. All I could suggest is bisect between the kernel versions based on the PCI addresses assigned to the cards. |
Have you solve the issue yet?
Kernel: It failed here because the driver cannot find a BAR for crosslink. |
I git cloned, build and installed Kernel v5.10-rc6 but here the
ntb_hw_switchtec
driver fails to register:Plus a bunch of
Full dmsg: v5.10-rc6
I experienced the same behaviour on Kernel v5.9 shipped with Ubuntu 20.04.
On the other hand, the same exact setup works flawlessly on Kernel v4.16.
Are there any known issues with Kernel version > 5.9?
HW information
Distro
The text was updated successfully, but these errors were encountered: