Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SkyConnect with new "ember" driver: Network extremely slow. OOM error in logs #22249

Closed
Ra72xx opened this issue Apr 18, 2024 · 51 comments
Closed
Labels
ember Issues related to ember driver problem Something isn't working

Comments

@Ra72xx
Copy link

Ra72xx commented Apr 18, 2024

What happened?

I try to switch my SkyConnect from standard 'ezsp' to 'ember' driver. Therefore I updated to the current 7.4.1.0 firmware as recommended in the respective thread. Everything works as before as long as I use the 'ezsp' driver, but as soon as I switch to 'ember', the network is extremely slow. E.g. switching a light on takes about 1 to 2 minutes until it reacts.
The startup log shows an "OUT OF MEMORY" error.

Apr 18 16:56:41 Obelix zigbee2mqtt[1460139]: ======== Ember Adapter Starting ========
Apr 18 16:56:41 Obelix zigbee2mqtt[1460139]: ======== EZSP starting ========
Apr 18 16:56:41 Obelix zigbee2mqtt[1460139]: ======== ASH NCP reset ========
Apr 18 16:56:41 Obelix zigbee2mqtt[1460139]: ======== ASH starting ========
Apr 18 16:56:42 Obelix zigbee2mqtt[1460139]: ======== ASH connected ========
Apr 18 16:56:42 Obelix zigbee2mqtt[1460139]: ======== ASH started ========
Apr 18 16:56:42 Obelix zigbee2mqtt[1460139]: ======== EZSP started ========
Apr 18 16:56:42 Obelix zigbee2mqtt[1460139]: [EzspConfigId] Failed to SET "APS_UNICAST_MESSAGE_COUNT" TO "32" with status=ERROR_OUT_OF_MEMORY.
Apr 18 16:56:42 Obelix zigbee2mqtt[1460139]: [STACK STATUS] Network up.
Apr 18 16:56:42 Obelix zigbee2mqtt[1460139]: [INIT TC] NCP network matches config.
Apr 18 16:56:42 Obelix zigbee2mqtt[1460139]: [CONCENTRATOR] Started source route discovery. 1248ms until next broadcast.
Apr 18 16:56:43 Obelix zigbee2mqtt[1460139]: Zigbee2MQTT:error 2024-04-18 16:56:43: Entity 'homeassistant/binary_sensor' is unknown

The "homeassistant/binary_sensor" error also happens with the 'ezsp' driver (don't know why).

What did you expect to happen?

No response

How to reproduce it (minimal and precise)

No response

Zigbee2MQTT version

1.36.1 commit: ffc2ff1

Adapter firmware version

7.4.1.0 build 0

Adapter

Skyconnect EZSP v13

Setup

Z2M in Podman container

Debug log

No response

@Ra72xx Ra72xx added the problem Something isn't working label Apr 18, 2024
@Koenkk Koenkk added the ember Issues related to ember driver label Apr 20, 2024
@Nerivec
Copy link
Collaborator

Nerivec commented Apr 20, 2024

The config that fails to set is not a problem, it just uses the in-firmware value instead.

Did you let it run for a few hours before retrying after the switch? It could just be the network that needs to rebuild itself for some reason.
Do you have any offline devices?
You will have to provide debug logs of the actual issue otherwise.

@Ra72xx
Copy link
Author

Ra72xx commented Apr 30, 2024

I tried it several times in the last two weeks and it was always the same: The network was extremely slow after the update, switching lights took minutes. Window sensors, however, reacted at once in Z2M. When switching back to ezsp after several hours, everything was fine at once.
However, I don't have the time for logs right now, will look into it some other time.

@Ra72xx
Copy link
Author

Ra72xx commented May 2, 2024

@Nerivec
As the new 1.37 release finally offers the possibility to get herdsman logs easily, I captured a few minutes debug log during which I switched random lights on and off, both from HA and from the Z2M ui. Switching lights was once again extremely slow (minutes later). Strange enough, window and temperature sensors seem to work OK. Current firmware is 7.4.2.

log.zip

EDIT: Just for the fun of it, I checked how long it takes from switching the light on to reaction: 3min 30 secs! I did not even know that stuff can take so long without a timeout.

@Ra72xx
Copy link
Author

Ra72xx commented May 2, 2024

I noticed that when starting with the ember driver, the Z2M container often fails with the following error at the first attempt, however works after the second (automatic) restart:

Mai 02 12:14:13 Obelix zigbee2mqtt[521643]: [2024-05-02 12:14:13] error:         zh:ember:uart:ash: Received ERROR from NCP while connecting, with code=RESET_WATCHDOG.
Mai 02 12:14:13 Obelix zigbee2mqtt[521643]: [2024-05-02 12:14:13] error:         zh:ember:uart:ash: ASH disconnected | NCP status: ASH_NCP_FATAL_ERROR
Mai 02 12:14:16 Obelix zigbee2mqtt[521643]: [2024-05-02 12:14:16] warning:         zh:ember:uart:ash: Frame(s) in progress cancelled in [1ac1020b0a527e]
Mai 02 12:14:16 Obelix zigbee2mqtt[521643]: [2024-05-02 12:14:16] error:         zh:ember:uart:ash: Received unexpected reset from NCP, with reason=RESET_SOFTWARE.
Mai 02 12:14:16 Obelix zigbee2mqtt[521643]: [2024-05-02 12:14:16] error:         zh:ember:uart:ash: ASH disconnected: ASH_ERROR_NCP_RESET | NCP status: ASH_NCP_FATAL_ERROR
Mai 02 12:14:29 Obelix zigbee2mqtt[521643]: [2024-05-02 12:14:29] error:         zh:ember:ezsp: ERROR Transaction failure; status=undefined. Last: [FRAME: ID=0:"VERSION" Seq=0 Len=4].
Mai 02 12:14:29 Obelix zigbee2mqtt[521643]: [2024-05-02 12:14:29] error:         zh:ember: !!! NCP FATAL ERROR reason=undefined. ATTEMPTING RESET... !!!
Mai 02 12:14:29 Obelix zigbee2mqtt[521643]: [2024-05-02 12:14:29] error:         z2m: Error while starting zigbee-herdsman
Mai 02 12:14:29 Obelix zigbee2mqtt[521643]: [2024-05-02 12:14:29] error:         z2m: Failed to start zigbee
Mai 02 12:14:29 Obelix zigbee2mqtt[521643]: [2024-05-02 12:14:29] error:         z2m: Check https://www.zigbee2mqtt.io/guide/installation/20_zigbee2mqtt-fails-to-start.html for possible solutions
Mai 02 12:14:29 Obelix zigbee2mqtt[521643]: [2024-05-02 12:14:29] error:         z2m: Exiting...
Mai 02 12:14:29 Obelix zigbee2mqtt[521643]: [2024-05-02 12:14:29] error:         z2m: Error: 
Mai 02 12:14:29 Obelix zigbee2mqtt[521643]:     at Ezsp.ezspVersion (/app/node_modules/zigbee-herdsman/src/adapter/ember/ezsp/ezsp.ts:1301:19)
Mai 02 12:14:29 Obelix zigbee2mqtt[521643]:     at EmberAdapter.emberVersion (/app/node_modules/zigbee-herdsman/src/adapter/ember/adapter/emberAdapter.ts:1642:66)
Mai 02 12:14:29 Obelix zigbee2mqtt[521643]:     at EmberAdapter.initEzsp (/app/node_modules/zigbee-herdsman/src/adapter/ember/adapter/emberAdapter.ts:802:9)
Mai 02 12:14:29 Obelix zigbee2mqtt[521643]:     at EmberAdapter.start (/app/node_modules/zigbee-herdsman/src/adapter/ember/adapter/emberAdapter.ts:2660:24)
Mai 02 12:14:29 Obelix zigbee2mqtt[521643]:     at Controller.start (/app/node_modules/zigbee-herdsman/src/controller/controller.ts:124:29)
Mai 02 12:14:29 Obelix zigbee2mqtt[521643]:     at Zigbee.start (/app/lib/zigbee.ts:62:27)
Mai 02 12:14:29 Obelix zigbee2mqtt[521643]:     at Controller.start (/app/lib/controller.ts:108:27)
Mai 02 12:14:29 Obelix zigbee2mqtt[521643]:     at start (/app/index.js:107:5)

@Nerivec
Copy link
Collaborator

Nerivec commented May 2, 2024

I think something is not going right during startup sequence. The send queue is way too high (117 in your first debug line, 211 by the end). Seems something is sending far too many commands, and the driver can't catch up fast enough.


Since you mentioned using 7.4.2, did you use one from here to avoid hardware flow control issues?
https://github.com/darkxst/silabs-firmware-builder/tree/ember-nohw/firmware_builds/skyconnect
(use with rtscts: false)


Not strictly related (or possibly entirely related 😄), but looks like you have these devices in a non-zigbee type group (group that simply sends X commands for X number of devices in it). I'd advise to use a Zigbee group, that will send only 1 command, should lighten the load quite a bit. (You can do the same for any other groups you have like that one.)

0xd85def11a10049b0
0xd85def11a1006a50
0xd85def11a100466a
0xd85def11a100469e
0xd85def11a10015b8
0xd85def11a100458f
0xd85def11a1003268
0xd85def11a1002561

@Ra72xx
Copy link
Author

Ra72xx commented May 2, 2024

I think something is not going right during startup sequence. The send queue is way too high (117 in your first debug line, 211 by the end). Seems something is sending far too many commands, and the driver can't catch up fast enough.

I don't know what could go wrong here. I use Z2M in a Podman container on x86-64 hardware, no fancy stuff, system load <10% most of the time, USB2.0 port and long cable to put the stick away from interference, etc.

As soon as I replace ember with ezsp in the config and restart, everything is fine again - at once.

Since you mentioned using 7.4.2, did you use one from here to avoid hardware flow control issues? https://github.com/darkxst/silabs-firmware-builder/tree/ember-nohw/firmware_builds/skyconnect (use with rtscts: false)

Yes, I use the file ncp-uart-hw-v7.4.2.0-skyconnect-115200.gbl from this location. I assume that this is with "hw" flow control enabled, however disabling hardware rts/cts in Z2M didn't seem not to help in my case (tried both variants).

EDIT: Clicking on the link you gave does redirect to the "normal" firmware folder, the folder path seems to be different than the one given in the text?!

Not strictly related (or possibly entirely related 😄), but looks like you have these devices in a non-zigbee type group (group that simply sends X commands for X number of devices in it). I'd advise to use a Zigbee group,

How did you find out about that group? I never intentionally created a non-Zigbee-group, however it might be some vendor specific stuff. Can you pinpoint which device is the culprit or point out to me for what log entry to watch? Maybe this could explain some of the problems.

Thanks for your support and your work on a better SkyConnect driver!

@Nerivec
Copy link
Collaborator

Nerivec commented May 2, 2024

I don't have that redirect. The gbl file you linked is the right one though. Make sure to set rtscts: false or just remove the line.
(PS: 'hw' in the name is just the automatic naming from the firmware build, these don't have it to avoid issues described here.)

Search for this line in your logs:
zh:controller:endpoint: ZCL command 0xd85def11a10049b0
You will see they are often followed with the same type line with the IEEE addresses I mentioned before. This is firing 8 commands instantly every time (seems to be every few seconds too!). It could just be an automation that triggers for each of these at the same time though, not necessarily a group (it just is often the case with HA users). In that case, you will want to create a Zigbee group, and trigger that group in your automation instead; should result in the same thing, except you will trigger 1 command instead of 8 each time.

Device 61960 (0xf208) is also triggering a lot of delivery failures. You might want to check that one more closely.

@Ra72xx
Copy link
Author

Ra72xx commented May 2, 2024

I just did one more try with the new driver and rtscts disabled:

serial:
  port: /dev/ttyACM0
  adapter: ember
  rtscts: false

Doesn't seem to help, first launch of Z2M failed nevertheless, second one was OK. Slow reaction again...

Mai 02 15:03:24 Obelix zigbee2mqtt[591802]: [2024-05-02 15:03:24] error:         zh:ember:uart:ash: Received ERROR from NCP while connectin>
Mai 02 15:03:24 Obelix zigbee2mqtt[591802]: [2024-05-02 15:03:24] error:         zh:ember:uart:ash: ASH disconnected | NCP status: ASH_NCP_>
Mai 02 15:03:26 Obelix zigbee2mqtt[591802]: [2024-05-02 15:03:26] warning:         zh:ember:uart:ash: Frame(s) in progress cancelled in [1a>
Mai 02 15:03:26 Obelix zigbee2mqtt[591802]: [2024-05-02 15:03:26] error:         zh:ember:uart:ash: Received unexpected reset from NCP, wit>
Mai 02 15:03:26 Obelix zigbee2mqtt[591802]: [2024-05-02 15:03:26] error:         zh:ember:uart:ash: ASH disconnected: ASH_ERROR_NCP_RESET |>
Mai 02 15:03:39 Obelix zigbee2mqtt[591802]: [2024-05-02 15:03:39] error:         zh:ember:ezsp: ERROR Transaction failure; status=undefined>
Mai 02 15:03:39 Obelix zigbee2mqtt[591802]: [2024-05-02 15:03:39] error:         zh:ember: !!! NCP FATAL ERROR reason=undefined. ATTEMPTING>
Mai 02 15:03:39 Obelix zigbee2mqtt[591802]: [2024-05-02 15:03:39] error:         z2m: Error while starting zigbee-herdsman
Mai 02 15:03:39 Obelix zigbee2mqtt[591802]: [2024-05-02 15:03:39] error:         z2m: Failed to start zigbee
Mai 02 15:03:39 Obelix zigbee2mqtt[591802]: [2024-05-02 15:03:39] error:         z2m: Check https://www.zigbee2mqtt.io/guide/installation/2>
Mai 02 15:03:39 Obelix zigbee2mqtt[591802]: [2024-05-02 15:03:39] error:         z2m: Exiting...
Mai 02 15:03:39 Obelix zigbee2mqtt[591802]: [2024-05-02 15:03:39] error:         z2m: Error: 
Mai 02 15:03:39 Obelix zigbee2mqtt[591802]:     at Ezsp.ezspVersion (/app/node_modules/zigbee-herdsman/src/adapter/ember/ezsp/ezsp.ts:1301:>
Mai 02 15:03:39 Obelix zigbee2mqtt[591802]:     at EmberAdapter.emberVersion (/app/node_modules/zigbee-herdsman/src/adapter/ember/adapter/e>
Mai 02 15:03:39 Obelix zigbee2mqtt[591802]:     at EmberAdapter.initEzsp (/app/node_modules/zigbee-herdsman/src/adapter/ember/adapter/ember>
Mai 02 15:03:39 Obelix zigbee2mqtt[591802]:     at EmberAdapter.start (/app/node_modules/zigbee-herdsman/src/adapter/ember/adapter/emberAda>
Mai 02 15:03:39 Obelix zigbee2mqtt[591802]:     at Controller.start (/app/node_modules/zigbee-herdsman/src/controller/controller.ts:124:29)
Mai 02 15:03:39 Obelix zigbee2mqtt[591802]:     at Zigbee.start (/app/lib/zigbee.ts:62:27)
Mai 02 15:03:39 Obelix zigbee2mqtt[591802]:     at Controller.start (/app/lib/controller.ts:108:27)
Mai 02 15:03:39 Obelix zigbee2mqtt[591802]:     at start (/app/index.js:107:5)

@nicandris
Copy link

having the same issue. Most of my devices are working as expected (battery or not) but one remote (icasa) only was sending linkquality and lastseen.

I manually bound the endpoints, now i get a lot more (including action) BUT i cant get action_group, so basically i have the action but i dont know which button i pressed 😓

image
image

@MKlitgaard2
Copy link

Same here. As @Ra72xx writes "As soon as I replace ember with ezsp in the config and restart, everything is fine again" When I do that everything works.

@Ra72xx
Copy link
Author

Ra72xx commented Jun 14, 2024

I routinely try to switch to the new driver when new Z2M versions appear, and it is always the same. It also seems not to get better when giving the network a few hours to "calm down" and/or unplugging the USB dongle after the change for some time.
I understand that the old driver is deprecated, because it has lots of problems of its own, but I'm a bit concerned as I'm nevertheless still forced to use it as it at least works most of the time.

@Nerivec
Copy link
Collaborator

Nerivec commented Jun 14, 2024

Seems like other comments may not be related to this issue at all. @Ra72xx I'm not sure what's going on in your scenario, I don't have anyone else reporting these delays. Did you figure out what triggers those massive amounts of messages sent on start? I'm pretty sure fixing that, will fix the rest.

@Ra72xx
Copy link
Author

Ra72xx commented Jun 14, 2024

Seems like other comments may not be related to this issue at all. @Ra72xx I'm not sure what's going on in your scenario, I don't have anyone else reporting these delays. Did you figure out what triggers those massive amounts of messages sent on start? I'm pretty sure fixing that, will fix the rest.

As you guessed, probably an Homeassistant automation which pushed - for every room and every thermostat on every sensor change - temperature values to the respective thermostats. I've modified the automation sincde (and it's no longer active currently).
Moreover, I have nine of those devices: https://www.zigbee2mqtt.io/devices/6735_6736_6737.html . These seem to do some kind of polling in order to get the state of the relais (this directly operates the device and seemingly has to be polled to get the state if the switch has been manually operated). So I can't get rid of this Zigbee traffic.

@Nerivec
Copy link
Collaborator

Nerivec commented Jun 14, 2024

To get to over 100 in-flight messages right from the start, something is definitely wrong. Although, on its own, ember should be able to deal with that traffic (it did when I stress-tested it during dev, of course, delays become visible with so many), the problem is, more traffic of equivalent proportion seems to continue happening, which means ember can never catch up and get to an "idle" state where things can go back to "normal".
I know you had some troubles with ezsp in the past, but this behavior, I don't think it could handle without regular crashes (lack of order coupled with way too much traffic). It raises the question of, is that behavior happening in ezsp at all (and if it is, how come it does not crash). Any chance you can provide a debug log of ezsp from the start of Z2M to at least 5-10min after that, so I can see how it behaves? (also keep one next time you try ember, I'll check it again for your changes)

@lawrencedudley
Copy link

Ember at home, skyconnect with raspberry pi works beautifully. Ember at work, skyconnect with an Intel NUC, completely unusable and constant NCP restarts and timeouts.

Not sure if the different platforms are significant here but it’s weird that they behave so differently.

Both networks with over 100 nodes.

@Nerivec
Copy link
Collaborator

Nerivec commented Jun 22, 2024

@lawrencedudley Interesting. One of my setups is on Intel NUC (Dongle-E though), no issue at all. NUCs usually are far more stable than PIs too.
Always the same error that triggers the restarts? Maybe an interference issue from the different location? Did you check the environment?
I integrated scanner functions (channels/existing networks) in https://github.com/Nerivec/ember-zli (I haven't had reports of usage with SkyConnect yet, you can create an issue there if you encounter any).

@lawrencedudley
Copy link

I’ll give it another go on the next release, I don’t think external factors are significant here as reverting to the EZSP driver on the same Skyconnect firmware makes everything happy again.

On Ember, doing an OTA update was a way of repeatedly causing restarts on the driver.

Happy to jump on a quick screen share if it helps you with tracking down what’s causing it or I can share logs.

@Ra72xx
Copy link
Author

Ra72xx commented Jun 22, 2024

Looking into the logs, when using the old ezsp driver, I get the message "warning: zh:ezsp: Deprecated driver 'ezsp' currently in use...." at least every hour or so. Is this sign of a restart, too?
I'll look into the new ember driver once again soon, however I'm a bit reluctant to break the at least somehow running setup as long as I don't have enough spare time to setup everything again from scratch if everything goes wrong ;-)

@Nerivec
Copy link
Collaborator

Nerivec commented Jun 22, 2024

@lawrencedudley Weirder still, OTA should be far more stable on ember. I actually had some recent feedback on a network of over 300 devices, where every device was updated over the course of a couple of weeks (half of it after the converters OTA refactoring in dev); that person had no issue.
I'd have to say something is interfering in your case (could be device-specific). ezsp may bypass them due to the lack of implementation (which can cause other issues down the road). It would be great if you could share a debug log that shows what error specifically is thrown.

@Ra72xx No, Koenkk added a warning with last release that logs every hour.
You can make an NVM backup with ember-zli or bellows before trying anything. That should allow you to restore the adapter if anything goes "too wrong".

@lawrencedudley
Copy link

I’ll grab a debug log when I get home and stick it in here!

Like I said, at home, works great. OTA seems much happier running multiple OTA updates at the same time (3/4) where EZSP would have a bad time of it with more than one running concurrently.

@lawrencedudley
Copy link

@Nerivec what's the easiest way for me to retain logs from Z2M in Home Assistant? It doesn't seem to respect the max_entries setting from system log so I'm only getting 50 entries and that's like 3 seconds worth.

@Nerivec
Copy link
Collaborator

Nerivec commented Jun 22, 2024

You can grab the file directly, it's inside the z2m config folder (use the file explorer add-on, or studio code server).

@lawrencedudley
Copy link

Cool, got it. Not sure how I missed that!

I've attached a few logs.

log1:

There's a lot of `Delivery of MULTICAST failed for "65533" which feels like it might have some kind of retry logic on it in the ember driver, which may then be causing the network to be so busy it falls over?

log2:

zh:ember:uart:ash: NOT READY - Signaling NCP feels like a bad thing as well.

log3:

This also a bit worrying:

[2024-06-20 23:54:39] error: 	zh:ember:uart:ash: Received ERROR from NCP, with code=ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT.
[2024-06-20 23:54:39] error: 	zh:ember:uart:ash: ASH disconnected | NCP status: ASH_NCP_FATAL_ERROR
[2024-06-20 23:54:39] error: 	zh:ember:uart:ash: Error while parsing received frame, status=ASH_NCP_FATAL_ERROR.
[2024-06-20 23:54:39] error: 	zh:ember: !!! NCP FATAL ERROR reason=HOST_FATAL_ERROR. ATTEMPTING RESET... !!!

log.log
log1.log
log2.log

Switching it back to EZSP works like a dream though.

@lawrencedudley
Copy link

Try the new release of Z2M - it's been way more stable for me and this issue has pretty much disappeared unless you OTA more than one device at once.

@Nerivec
Copy link
Collaborator

Nerivec commented Jul 2, 2024

You were testing with the latest dev (not release) I believe @lawrencedudley That has a lot more refactoring that came along with the v8 support. It will go under test in dev for a month before it's in release though, as it does change a lot of stuff 😉

@Ra72xx
Copy link
Author

Ra72xx commented Jul 2, 2024

Yes, the latest stable release does not help me with my problem, unfortunately.

@Ra72xx
Copy link
Author

Ra72xx commented Jul 2, 2024

Here my logfile with the currenct stable version and the "ember" driver.
ember.log
As a sidenote, I was no longer able to access the web ui when the ember driver and debug log were enabled.

@lawrencedudley
Copy link

It seems to sometimes take a surprisingly long time for the web UI to come back alive.

Have you tried edge @Ra72xx?

@Ra72xx
Copy link
Author

Ra72xx commented Jul 2, 2024

I'm just experimenting with the latest-dev docker image and there is no real difference, everything is delayed.
Web ui seems not to be accessible. And the first restart of Zigbee2MQTT always ends up with errors like "zh:ember: !!! ADAPTER FATAL ERROR reason=HOST_FATAL_ERROR.", only the second restart works.
Back to ezsp and at once it works again.

@lawrencedudley
Copy link

Hate to say it but I'm now seeing similar!

@lawrencedudley
Copy link

@Ra72xx are you using a Skyconnect USB stick by any chance? Just trying to correlate these issues.

@lawrencedudley
Copy link

@Ra72xx this might be a dumb one but I've just disabled availability and that's made it boot straight up. I think ember may be trying to ping every device on the network before it declares the UI ready?

@lawrencedudley
Copy link

Sadly I'm going to have to switch back to EZSP as well. Lots of angry "the lights aren't working" going on in my house.

@Ra72xx
Copy link
Author

Ra72xx commented Jul 3, 2024

@Ra72xx this might be a dumb one but I've just disabled availability and that's made it boot straight up. I think ember may be trying to ping every device on the network before it declares the UI ready?

I'll have to look into it. Disabling "availability" definitely changes something with "ezsp" driver, as this enabled me displaying the networt map for the first time since months, so it might help with "ember", too.
However, while I understand that depracating the "ezsp" driver makes sense from a support point of view, I'd also suggest not to push this to much. It seems that lots of people have still problems like this with "ember" and are irritated by the hourly deprecation note for the at least somehow working "ezsp" driver. It looks as if "ember" still has to get some more stability with "problematic" network situations (which probably are not so uncommon in the heterogenuous Zigbee world).

@lawrencedudley
Copy link

I would agree regarding being very careful regarding the EZSP driver as it's likely to leave some users high and dry otherwise. @Nerivec has been doing a great job on the ember driver, it's just painful work as everyone's setup is obviously slightly different and remote debugging is tricky.

With the latest update I've gone from ember working great at home and not working in the office to it working fine in the office but not at home.

The real risk with ember is that for stuff like lighting, stability is everything. As soon as a light switch or a motion sensor doesn't work or is delayed etc. it causes real pain for users.

Tl;dr just because the bugs in ember are really hard to replicate, I wouldn't presume they don't need fixing!

@Nerivec
Copy link
Collaborator

Nerivec commented Jul 3, 2024

@Ra72xx You still have that issue with spamming in the request queue. Seems to be caused by these that don't support reporting, so Z2M reads the state every 5 seconds instead.

[2024-07-02 16:52:02] debug: 	zh:ember:queue: Status queue=114 priorityQueue=0.

Do you have a log from ezsp?

@lawrencedudley Latest release contains no change in ember that would explain your light problem. Koenkk/zigbee-herdsman#1085 Koenkk/zigbee-herdsman#1089 Koenkk/zigbee-herdsman#1080 these are the only changes. I'd say it's likely more about the updating/reboot itself. A partial network rebuild could have been triggered for some reason and linked some stuff to "not as good" routers.

@lawrencedudley
Copy link

It seems to be really unstable on release and on edge

@Ra72xx
Copy link
Author

Ra72xx commented Jul 3, 2024

@Ra72xx You still have that issue with spamming in the request queue. Seems to be caused by these that don't support reporting, so Z2M reads the state every 5 seconds instead.

[2024-07-02 16:52:02] debug: 	zh:ember:queue: Status queue=114 priorityQueue=0.

Do you have a log from ezsp?

Thanks for your work on the ember driver.. Log from a restart with ezsp: ezsp.zip

Yes, I also think that those Busch Jäger devices https://www.zigbee2mqtt.io/devices/6735_6736_6737.html put some stress on the network. However, both Z2M/ezsp and before that Deconz/Phoscon were able to handle that somehow.
A parameter to fine tune this polling intervall would be great, I could live with longer intervalls (and for some of this devices polling is not even necessary as in some combinations there is no relais for direct switching connected).

@Nerivec
Copy link
Collaborator

Nerivec commented Jul 3, 2024

ember has a known issue with failed pings (it doesn't like them). Since those Busch Jager devices are doing the equivalent of a ping when reading the on/off state, and one always seems to fail (0xd85def11a10049b0, both in ezsp and ember), I think it's the one causing the biggest troubles.
I'm looking for a solution for the overall issue, but one that won't result in what I'm seeing in your ezsp logs (everything out of sequence, all the time). It's next on my TODO list 😉

PS: Might be worth asking for a revamp for that Busch Jager converter in here as a feature request. I'm sure someone can make a few changes, spruce it up a little. Make sure to describe behaviors and what parameters would be helpful, as I doubt a lot of people have encountered those devices.

Off-topic:
You do have one interesting device in your setup: 0x90ab96fffe6a2b7f. This guy appears to be a concentrator (device that aggregates routes for a whole portion of the network or the whole network). First time I'm seeing one (ezsp doesn't implement the handler for that, so, never could "see" one in the past). I've been unable to test this multi-concentrator behavior yet (the adapter also is a concentrator). Very few commercially available devices appear to implement this feature.
I don't expect this would cause trouble (EmberZNet handles that internally in the network, it never needs Z2M for that part), but definitely something to remember, and mention if you ever encounter an issue.

@Ra72xx
Copy link
Author

Ra72xx commented Jul 3, 2024

Thanks for looking into the issue.

You do have one interesting device in your setup: 0x90ab96fffe6a2b7f
image

That is a standard Ikea device like this, https://www.zigbee2mqtt.io/devices/ICPSHC24-30EU-IL-1_ICPSHC24-10EU-IL-2.html#ikea-icpshc24-30eu-il-1%252Ficpshc24-10eu-il-2. I doubt that I'm the only one running this kind of stuff ;-).

@Ra72xx
Copy link
Author

Ra72xx commented Jul 4, 2024

ember has a known issue with failed pings (it doesn't like them). Since those Busch Jager devices are doing the equivalent of a ping when reading the on/off state, and one always seems to fail (0xd85def11a10049b0, both in ezsp and ember), I think it's the one causing the biggest troubles.

When writing my bug report concerning the Busch Jaeger converter, I also noticed why this special ping is failing: There is no state to be reported! This panel is not directly connected to a relay/dimmer, but only to a power source (which is one of the three possible setups provided by the manufacturer - mount the panel on relay, dimmer, or only mains adaptor). So this is more a bug in the respective converter.

@jvuotila
Copy link

jvuotila commented Jul 4, 2024

Unfortunately I also have to switch back from ember to ezsp. Zigbee2mqtt was crashing twice on hour with ember. Totally useless.

@lawrencedudley
Copy link

I'd be happy providing some hardware for @Nerivec to test with - I do think the work that's being done is the right work but I think given how wild the ecosystem is in general in terms of devices this isn't going to go away without a varied selection of devices on the same network.

Anyone else up for buying some hardware to spruce up @Nerivec's testing lab? I think it's mostly relays at the moment which are one of the simpler devices out there in reality.

@Nerivec
Copy link
Collaborator

Nerivec commented Jul 9, 2024

@Ra72xx Can you give the latest dev/edge another try? I implemented concurrency in zigbee-herdsman 0.52.0, it should hopefully lessen the blow from the spam you are experiencing.
PS: You can use the regular setting adapter_concurrent to configure it (currently, default is 16 if not provided, max is 32, min 1).

@Ra72xx
Copy link
Author

Ra72xx commented Jul 10, 2024

My network completely completely broke down with the "ezsp" driver over night for no obvious reason, so I had nothing to lose and switched to latest-dev and tried "ember" once again.
And yes, indeed the network seems to work now! The devices were picked up at once and are controllable, in most cases without delay.
However, the problematic Busch-Jaeger devices have a strange noticeable delay when operated via Zigbee (about 2-3 secondes delay). Maybe a side effect of the configurable polling?
I'll further look into it. Thanks for your work on this driver!

@Nerivec
Copy link
Collaborator

Nerivec commented Jul 10, 2024

Great news! Thanks for the feedback.
For those Busch-Jaeger, do you mean a delay in reporting the state in Z2M, or a delay in the device actually turning on/off?

@Ra72xx
Copy link
Author

Ra72xx commented Jul 10, 2024

Delay when turning on/off with Zigbee. I understand that a delay of state feedback when operating manually is to be epected after the patch to the converter, but not this way round.
However, it might be imagination, but it seems to get better with time.

@Nerivec
Copy link
Collaborator

Nerivec commented Jul 10, 2024

When you have a moment, can you grab a debug log of "normal operation", trigger that device, let some time pass, then trigger it again (just so I have 2 samples)?

@Ra72xx
Copy link
Author

Ra72xx commented Jul 11, 2024

It is just that the lights (relay or dimmer) controlled directly by Busch-Jaeger switches sometimes (not always, but very often) now have a slight delay when operated via Zigbee (Home-Assistant or Z2M web ui). All other devices in the network seem to react almost instantly to Zigbee commands. Operating the Busch Jaeger switch manually shows no delay (both the uppermost row, naturally, as this row is wired directly to the relay/dimmer, and the other, Zigbee-only rows).

We're talking about a fews seconds here, and only about a few devices, so unlike the general minute-long lag this bug report was about.

Maybe it is rather a side effect of patch https://github.com/Koenkk/zigbee-herdsman-converters/commit/6f5707bff5f79bc8c6c7dc1b78c9b4d8a4d0f607 and not ember-specific.

The log below shows a single keypress which took about 3 seconds. It does not really show anything special IMHO.

Info 2024-07-11 04:55:09z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/WZ: Dimmer-Bedienelement-Kombi Tür rechts', payload '{"brightness":254,"brightness_relay":254,"device":{"applicationVersion":1,"dateCode":"20161209","friendlyName":"WZ: Dimmer-Bedienelement-Kombi Tür rechts","hardwareVersion":0,"ieeeAddr":"0xd85def11a10015b8","manufacturerID":4398,"manufacturerName":"Busch-Jaeger","model":"6735/6736/6737","networkAddress":140,"powerSource":"Mains (single phase)","softwareBuildID":"1.2.0","stackVersion":1,"type":"Router","zclVersion":1},"last_seen":"2024-07-11T02:55:09.447Z","linkquality":212,"state":"ON","state_relay":"OFF"}'
Info 2024-07-11 04:55:09z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/WZ: Dimmer-Bedienelement-Kombi Tür rechts/relay', payload '{"brightness":254,"state":"OFF"}'
Info 2024-07-11 04:55:09z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/WZ: Dimmer-Bedienelement-Kombi Tür rechts', payload '{"brightness":254,"brightness_relay":254,"device":{"applicationVersion":1,"dateCode":"20161209","friendlyName":"WZ: Dimmer-Bedienelement-Kombi Tür rechts","hardwareVersion":0,"ieeeAddr":"0xd85def11a10015b8","manufacturerID":4398,"manufacturerName":"Busch-Jaeger","model":"6735/6736/6737","networkAddress":140,"powerSource":"Mains (single phase)","softwareBuildID":"1.2.0","stackVersion":1,"type":"Router","zclVersion":1},"last_seen":"2024-07-11T02:55:09.447Z","linkquality":212,"state":"ON","state_relay":"ON"}'
Info 2024-07-11 04:55:09z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/WZ: Dimmer-Bedienelement-Kombi Tür rechts/relay', payload '{"brightness":254,"state":"ON"}'
Info 2024-07-11 04:55:10z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/WZ: Dimmer-Bedienelement-Kombi Tür rechts', payload '{"brightness":254,"brightness_relay":254,"device":{"applicationVersion":1,"dateCode":"20161209","friendlyName":"WZ: Dimmer-Bedienelement-Kombi Tür rechts","hardwareVersion":0,"ieeeAddr":"0xd85def11a10015b8","manufacturerID":4398,"manufacturerName":"Busch-Jaeger","model":"6735/6736/6737","networkAddress":140,"powerSource":"Mains (single phase)","softwareBuildID":"1.2.0","stackVersion":1,"type":"Router","zclVersion":1},"last_seen":"2024-07-11T02:55:09.988Z","linkquality":212,"state":"ON","state_relay":"ON"}'
Info 2024-07-11 04:55:10z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/WZ: Dimmer-Bedienelement-Kombi Tür rechts/relay', payload '{"brightness":254,"state":"ON"}'
Info 2024-07-11 04:55:14z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/WZ: Dimmer-Bedienelement-Kombi Tür rechts', payload '{"brightness":254,"brightness_relay":254,"device":{"applicationVersion":1,"dateCode":"20161209","friendlyName":"WZ: Dimmer-Bedienelement-Kombi Tür rechts","hardwareVersion":0,"ieeeAddr":"0xd85def11a10015b8","manufacturerID":4398,"manufacturerName":"Busch-Jaeger","model":"6735/6736/6737","networkAddress":140,"powerSource":"Mains (single phase)","softwareBuildID":"1.2.0","stackVersion":1,"type":"Router","zclVersion":1},"last_seen":"2024-07-11T02:55:14.730Z","linkquality":212,"state":"ON","state_relay":"ON"}'
Info 2024-07-11 04:55:14z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/WZ: Dimmer-Bedienelement-Kombi Tür rechts/relay', payload '{"brightness":254,"state":"ON"}'
Info 2024-07-11 04:55:15z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/WZ: Dimmer-Bedienelement-Kombi Tür rechts', payload '{"brightness":254,"brightness_relay":254,"device":{"applicationVersion":1,"dateCode":"20161209","friendlyName":"WZ: Dimmer-Bedienelement-Kombi Tür rechts","hardwareVersion":0,"ieeeAddr":"0xd85def11a10015b8","manufacturerID":4398,"manufacturerName":"Busch-Jaeger","model":"6735/6736/6737","networkAddress":140,"powerSource":"Mains (single phase)","softwareBuildID":"1.2.0","stackVersion":1,"type":"Router","zclVersion":1},"last_seen":"2024-07-11T02:55:15.163Z","linkquality":212,"state":"ON","state_relay":"ON"}'
Info 2024-07-11 04:55:15z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/WZ: Dimmer-Bedienelement-Kombi Tür rechts/relay', payload '{"brightness":254,"state":"ON"}'
Info 2024-07-11 04:55:15z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/WZ: Dimmer-Bedienelement-Kombi Tür rechts', payload '{"brightness":254,"brightness_relay":254,"device":{"applicationVersion":1,"dateCode":"20161209","friendlyName":"WZ: Dimmer-Bedienelement-Kombi Tür rechts","hardwareVersion":0,"ieeeAddr":"0xd85def11a10015b8","manufacturerID":4398,"manufacturerName":"Busch-Jaeger","model":"6735/6736/6737","networkAddress":140,"powerSource":"Mains (single phase)","softwareBuildID":"1.2.0","stackVersion":1,"type":"Router","zclVersion":1},"last_seen":"2024-07-11T02:55:15.480Z","linkquality":212,"state":"ON","state_relay":"ON"}'
Info 2024-07-11 04:55:15z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/WZ: Dimmer-Bedienelement-Kombi Tür rechts/relay', payload '{"brightness":254,"state":"ON"}'
Info 2024-07-11 04:55:16z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/WZ: Dimmer-Bedienelement-Kombi Tür rechts', payload '{"brightness":254,"brightness_relay":254,"device":{"applicationVersion":1,"dateCode":"20161209","friendlyName":"WZ: Dimmer-Bedienelement-Kombi Tür rechts","hardwareVersion":0,"ieeeAddr":"0xd85def11a10015b8","manufacturerID":4398,"manufacturerName":"Busch-Jaeger","model":"6735/6736/6737","networkAddress":140,"powerSource":"Mains (single phase)","softwareBuildID":"1.2.0","stackVersion":1,"type":"Router","zclVersion":1},"last_seen":"2024-07-11T02:55:16.104Z","linkquality":212,"state":"ON","state_relay":"ON"}'
Info 2024-07-11 04:55:16z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/WZ: Dimmer-Bedienelement-Kombi Tür rechts/relay', payload '{"brightness":254,"state":"ON"}'

@Nerivec
Copy link
Collaborator

Nerivec commented Jul 11, 2024

Moving the conversation about that Busch-Jaeger in the previous PR.
I'll let you close this one when you can confirm it's been definitely solved. 😉

@Ra72xx
Copy link
Author

Ra72xx commented Jul 12, 2024

I'll close this bug report, as the problem seems to be fixed. I'm still victim of random Z2M crashes when restarting, but there is another bug report for that one.

@Ra72xx Ra72xx closed this as completed Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ember Issues related to ember driver problem Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants