Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpi-6.12.y: LAN7800 doesn't enumerate on Rpi 3B+ in specific scenario #6411

Open
lategoodbye opened this issue Oct 11, 2024 · 11 comments
Open

Comments

@lategoodbye
Copy link
Contributor

Describe the bug

Hi, first of all I would like to apologize for cross-posting upstream issues, but I am currently stuck with this DWC2 issue and would like to make some progress with the suspend to idle support.

Currently rpi-6.12.y is affected by this problem and I could already reproduce that if you start a Raspberry Pi 3 B Plus without USB peripherals only with debug UART the LAN7800 chip does not enumerate. I was hoping that someone of you could give some hints to analyze the root cause. Of course it is easy to revert the offending commit in question, but I consider the no_clock_gating setting to be valid.

Steps to reproduce the behaviour

  1. build arm64 kernel for Raspberry Pi 3B+ with bcm2711_defconfig and install it on SD card
  2. enable debug UART and DWC2 host in config.txt
  3. disconnect all USB peripheral from Raspberry Pi 3B+
  4. power on Raspberry Pi 3B+
  5. run a tool like lsusb to verify that LAN7800 doesn't get enumerated

Device (s)

Raspberry Pi 3 Mod. B+

System

vcgencmd version
Sep 13 2024 16:00:01
Copyright (c) 2012 Broadcom
version ddfba3e3c234500025b545512b4b214f28e453e9 (clean) (release) (start)

uname -a
Linux raspberrypi 6.12.0-rc2-v8+ #8 SMP PREEMPT Fri Oct 11 10:41:15 CEST 2024 aarch64 GNU/Linux

Logs

lsusb
Bus 001 Device 003: ID 0424:2514 Microchip Technology, Inc. (formerly SMSC) USB 2.0 Hub
Bus 001 Device 002: ID 0424:2514 Microchip Technology, Inc. (formerly SMSC) USB 2.0 Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

Additional context

No response

@6by9
Copy link
Contributor

6by9 commented Oct 17, 2024

@P33M You know USB and DWC2 best. Any thoughts?

@P33M
Copy link
Contributor

P33M commented Oct 18, 2024

A side-effect of disabling clock gating is that the host port is no longer forcibly suspended/unsuspended as part of the enter/exit clock gating routines. This sounds suspiciously like the LAN7800 can't handle being suspended.

I wonder if setting the RESET_RESUME quirk for the Microchip hubs will kick it out of being uncommunicative?

I see that later in the mailing list thread that HUB_QUIRK_DISABLE_AUTOSUSPEND is used - that's not ideal as it will match hubs on Raspberry Pi 1, 2 and 3 devices - increasing power consumption if Ethernet is not in use.

@lategoodbye
Copy link
Contributor Author

lategoodbye commented Oct 20, 2024

Thanks for the hint and the quirk avoided the issue.

But I like to come back to original / underlying problem, that the complete USB bus goes into autosuspend (port still powered, but no USB IRQs) and can only be waken up via sysfs, not via connecting a USB device. This problem can still be reproduced with a Raspberry Pi 3 A+ and a USB hub. This was the reason to choose HUB_QUIRK_DISABLE_AUTOSUSPEND, which i'm aware not a good solution.

Is this caused by the lack of a real USB PHY driver in Linux?
Or is the lack of runtime power management in the DWC2 driver?

Sorry, i'm a little bit lost in the complexity.

@P33M
Copy link
Contributor

P33M commented Oct 21, 2024

Ah, if connect events on the hub don't cause a remote wake, dwc2 doesn't appear to handle resume properly.

With the root port in suspend, what's the state of the debugfs regdump with the clock gating commit removed/applied? e.g. /sys/kernel/debug/usb/1000480000.usb# cat regdump on a Pi 5

@lategoodbye
Copy link
Contributor Author

I dumped the DWC2 register on a Raspberry Pi 3A+ with a USB 2.0 hub connected after boot. I hope it's okay that i made a diff between good (clock gating) and bad case (no clock gating):
https://gist.github.com/lategoodbye/f22f97b379de8777176cf90113fb10e2

@lategoodbye
Copy link
Contributor Author

@P33M Is there anything useful in this dump?

@P33M
Copy link
Contributor

P33M commented Oct 29, 2024

I scribbled some notes down then forgot about them:

HCFG:
No 32khz suspend clock
FS/LS PHY clock is 30/60MHz

HPRT0
no_cg:
enable, connected, powered
cg:
enable, connected, powered, suspended, pls=D+ high (correct, bus returns to FS termination)

PCGCTL:
no_cg:
nothing set
cg:
0x11 = PHY suspended, stop_pclk set

Programming model for powerdown:
- Set port suspend bit in hprt0
- set power clamps (there aren't any on bcm2835)
- stop PHY clock in PCGCTL
- Some blurb about associated platform power management

It may be the case that remote resume won't work because I don't think bcm2835 has a slow alternate PHY clock

Programming model for powerup:
- Clear stop phy clock bit
- Clear power clamps (not applicable)
- Application sets Port Resume in HPRT0
- waits 20ms
- Clears Port Resume in HPRT0
- Port should be available again?

So the question is, does dwc2 do the powerup sequence including forcing downstream resume?

@lategoodbye
Copy link
Contributor Author

lategoodbye commented Oct 29, 2024

Thanks. From my understanding DWC2 is completely interrupt driven. There are two interrupt handler in host mode (HCD of USB core + DWC2 driver), which shares the same interrupt. In the bad case this interrupt doesn't fire anymore on USB dis/connect ( no changes under /proc/interrupt ), so I don't see a chance how DWC2 can wakeup itself?

Can you please tell me which interrupt cause is relevant in this case (port interrupt)?

I will try to translate your last question in DWC2 code. Is _dwc2_hcd_resume called?

Here are the bad case logs (including debug for function call) from the linux-usb list:

[    2.334366] dwc2 3f980000.usb: supply vusb_d not found, using dummy
regulator
[    2.341892] dwc2 3f980000.usb: supply vusb_a not found, using dummy
regulator
[    2.400027] dwc2 3f980000.usb: DWC OTG Controller
[    2.404868] dwc2 3f980000.usb: new USB bus registered, assigned bus
number 1
[    2.412087] dwc2 3f980000.usb: irq 51, io mem 0x3f980000
[    2.711826] usb 1-1: new high-speed USB device number 2 using dwc2
[    3.195838] usb 1-1.1: new high-speed USB device number 3 using dwc2
[    3.435829] dwc2 3f980000.usb: dwc2_port_suspend
[    3.459914] dwc2 3f980000.usb: _dwc2_hcd_suspend
[    9.009743] dwc2 3f980000.usb: _dwc2_hcd_resume
[    9.030667] dwc2 3f980000.usb: dwc2_port_suspend
[    9.044137] dwc2 3f980000.usb: _dwc2_hcd_suspend
[    9.044222] dwc2 3f980000.usb: _dwc2_hcd_resume     # this suspend & resume cycle is just triggered by USB_ONBOARD_DEV and not related 
[    9.354370] usb 1-1.1: new high-speed USB device number 4 using dwc2
[    9.584095] dwc2 3f980000.usb: dwc2_port_suspend
[    9.599997] dwc2 3f980000.usb: _dwc2_hcd_suspend   # this the last log from DWC2 after being stuck

So in order to answer your question: I would say no

@P33M
Copy link
Contributor

P33M commented Oct 30, 2024

The two logs look odd - the bus is addressed twice in the first case and just device 1-1.1 is addressed twice in the second. But even so, I would expect a resume to be signalled at the root port after 1-1.1 is suspended for the second time, and picked up by dwc2, but that doesn't happen.

My guess is the lack of slow PHY clock breaks remote wakeup. What happens if you never set the PCGCTL.STOPPCLK bit?

@lategoodbye
Copy link
Contributor Author

Not sure that I can follow your suggestion because in bad case (no_clockgating) the PCGCTL.STOPPCLK is already never set.

Is it possible you got confused by the color highlighting (bad = green, good = red in this case)?

So I think something from dwc2_host_enter_clock_gating() should also be done in the no_clockgating case?

@lategoodbye
Copy link
Contributor Author

@P33M Gentle ping ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants