Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect CPU throttling events #221

Merged
1 commit merged into from
Jun 2, 2020
Merged

Detect CPU throttling events #221

1 commit merged into from
Jun 2, 2020

Conversation

saintaardvark
Copy link
Contributor

This adds two checks that take different approaches to detecting CPU throttling:

  • check_throttling_dmesg examines dmesg output for throttling events. The original request at Add detection of over-temperature events. #183 was compared with this page;

  • check_throttling_vcgencmd examines the output of the Raspberry Pi utility vcgencmd to determine if throttling is currently happening, or has occurred in the past.

Testing, happy case:

# ./checks.sh | jq .
[snip]
    {
      "name": "check_throttling_dmesg",
      "success": true,
      "status": "No cpu throttling events detected"
    },
    {
      "name": "check_throttling_vcgencmd",
      "success": true,
      "status": "No Raspberry Pi throttling events detected"
    },

Sad case, dmesg:

# echo '<4>Temperature above threshold, cpu clock throttled' > /dev/kmsg 
# echo '<4>Temperature above threshold, cpu clock throttled' > /dev/kmsg 
# ./checks.sh | jq 
[snip]
    {
      "name": "check_throttling_dmesg",
      "success": false,
      "status": "2 cpu throttling events detected, check CPU temperature"
    },

Sad case, throttling:

# yes | /dev/null &
# ./checks.sh | jq 
[snip]
    {
      "name": "check_throttling_vcgencmd",
      "success": false,
      "status": "Raspberry Pi throttling events detected:  ARM freq capping has occurred"
    },

(Fun thread on how to create a CPU spike with shell commands alone.)

Connects-to: #183, #209
Change-type: minor
Signed-off-by: Hugh Brown [email protected]

@saintaardvark
Copy link
Contributor Author

I'm not certain that the output of the Raspberry Pi throttling check is ideal; it might be better to simplify it into "Throttling is happening now / Throttling has happened in the past". @xginn8 , what do you think?

@xginn8
Copy link
Contributor

xginn8 commented May 29, 2020

A general comment: I've recently moved away from lots of checks doing specific things in favor of these "aggregated checks" like check_networking and check_localdisk that perform many tests and report a "federated" status. I think these make sense as tests integrated into the temperature check that already exists.

@saintaardvark saintaardvark force-pushed the feature/detect-temp-throttling branch 3 times, most recently from bb505b5 to 3395f65 Compare June 1, 2020 21:14
@saintaardvark
Copy link
Contributor Author

saintaardvark commented Jun 1, 2020

Updated overview

This adds updates the temperature check with two new tests that take different approaches to detecting CPU throttling:

  • test_throttling_dmesg examines dmesg output for throttling events. The original request at Add detection of over-temperature events. #183 was compared with this page;

  • test_throttling_vcgencmd examines the output of the Raspberry Pi utility vcgencmd to determine if throttling is currently happening, or has occurred in the past.

Happy case:

{
  "diagnose_version": "4.17.15",
  "checks": [
    [snip]
    {
      "name": "check_temperature",
      "success": true,
      "status": "No temperature issues detected"
    },

Sad case:

{
  "diagnose_version": "4.17.15",
  "checks": [
    [snip]
     {
      "name": "check_temperature",
      "success": false,
      "status": "Some temperature issues detected: \ntest_temperature_now Temperature above 80C detected (/sys/class/thermal/thermal_zone0)\ntest_throttling_vcgencmd Raspberry Pi throttling events detected:  ARM freq capping has occurred"
    },

@saintaardvark saintaardvark force-pushed the feature/detect-temp-throttling branch from 3395f65 to b28764a Compare June 1, 2020 23:26
diagnostics.md Outdated
the CPU directly. Additionally, adding other cooling mechanisms like fans or improving the location of the device can
help address heat issues.

### check_throttling_vcgencmd
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### check_throttling_vcgencmd
### test_throttling_vcgencmd

diagnostics.md Outdated
the CPU directly. Additionally, adding other cooling mechanisms like fans or improving the location of the device can
help address heat issues.

### check_throttling
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### check_throttling

this section can be deleted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dang, thank you, good catch.

This adds two checks that take different approaches to detecting CPU
throttling:

- test_throttling_dmesg examines dmesg output for throttling events

- test_throttling_vcgencmd examines the output of the Raspberry Pi utility `vcgencmd` to determine if throttling is currently happening, or has occurred in the past.

These are both called by check_throttling, which glues the output of
both together.

Connects-to: #183, #209
Change-type: minor
Signed-off-by: Hugh Brown <[email protected]>
@saintaardvark saintaardvark force-pushed the feature/detect-temp-throttling branch from b28764a to 72d23f3 Compare June 2, 2020 15:33
@ghost ghost merged commit 66a9381 into master Jun 2, 2020
@ghost ghost deleted the feature/detect-temp-throttling branch June 2, 2020 15:54
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants