Skip to content
This repository has been archived by the owner on Apr 5, 2024. It is now read-only.

monitoring: add support for builtin netdata #51

Closed
wants to merge 1 commit into from
Closed

Conversation

xginn8
Copy link
Contributor

@xginn8 xginn8 commented May 20, 2020

Change-type: patch
Signed-off-by: Matthew McGinn [email protected]

@xginn8 xginn8 requested a review from chrisys May 20, 2020 18:01
upstream:
- repo: 'balena-netdata'
url: 'https://github.com/balena-io-playground/balena-netdata'
Copy link
Contributor

@ptrm ptrm May 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take the repo is private so far? I tried looking it up and a not found error popped up 😮

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, it's a work-in-progress that we're hoping to get ready shortly!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ptrm I've open-sourced that repo, if you'd like to give this netdata approach a spin I'd love to hear what you think!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ptrm i wanted to follow up here and see if you had a chance to test this PR locally?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies, busy time. I'm deploying it now on my 4gb rpi4. I will push it to others if it succeeds, though I assune longer-term run would be needed to evaluate it better, right?

Copy link
Contributor

@ptrm ptrm Jun 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good in itself and fast enough on my 4gb rpi4 :)

When it comes to other devices, 1gb rpi3 runs fine (edit: and fluently, too :) ). I get reboot loops on my two 2gb rpi4s. I have to investigate though, because they were overclocked to 2,1GHz to speed up the one task limit allowed on them, so not a clean environment ;) When booted with monitor plugged, no warning overlays showed though, so might be something else.
EDIT: the 2gb rpi4s had somehow 4 core limit set, after limiting back to 1 they seem to be running fine.

image
The memory and cpu footprint worry me a bit though when it comes to 1GB devices, I think it might be good to disable netdata there, and on 2GB devices, if there ever appears a way to run 2+ tasks, it might be worth considering too.

But having said that, I don't have much idea whether it is possible to e.g. enable headless mode (so a lan peer could gather data), or change the kind and amount of metrics gathered to lower netdata's footprint.

Copy link
Contributor

@ptrm ptrm Jun 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I don't know how much this PR was considered an aid in investigatring #47, but does netdata support capturing oom_kill events? One of my suspicions is that the kernel might target wrong process (e.g. some balenaOS vital one) when out of memory, and as a consequence we get a reboot. I will try to fiddle with it in other ways, too, this month.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ptrm these are great notes, thank you very much for taking the time to test so thoroughly! I have opened a ticket to disable some features if we're in a low-mem/CPU situation: balena-io-examples/balena-netdata#10.

With respect to OOM events, we should collect all data up until the kernel pauses netdata as part of the OOM traversal. Unfortunately we'll lose any data for the time that it takes the kernel to traverse the page table. It would be worth reproducing #47 with netdata enabled to get a better idea of what's going on!

Change-type: patch
Signed-off-by: Matthew McGinn <[email protected]>
@xginn8 xginn8 closed this Jul 23, 2020
@xginn8 xginn8 deleted the netdata branch July 23, 2020 14:28
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants