Skip to content

Commit

Permalink
Detect CPU throttling events
Browse files Browse the repository at this point in the history
This adds two checks that take different approaches to detecting CPU
throttling:

- check_throttling_dmesg examines dmesg output for throttling events

- check_throttling_vcgencmd examines the output of the Raspberry Pi utility `vcgencmd` to determine if throttling is currently happening, or has occurred in the past.

Connects-to: #183, #209
Change-type: minor
Signed-off-by: Hugh Brown <[email protected]>
  • Loading branch information
Hugh Brown committed May 28, 2020
1 parent 7760e57 commit d4bc573
Show file tree
Hide file tree
Showing 2 changed files with 65 additions and 0 deletions.
18 changes: 18 additions & 0 deletions diagnostics.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,24 @@ In order to triage, either reduce the load on the device or replace/reseat/upgra
the CPU directly. Additionally, adding other cooling mechanisms like fans or improving the location of the device can
help address heat issues.

### check_throttling_dmesg
#### Summary
This check determines whether CPU throttling events are being logged.

#### Triage
In order to triage, either reduce the load on the device or replace/reseat/upgrade any heatsinks that may be attached to
the CPU directly. Additionally, adding other cooling mechanisms like fans or improving the location of the device can
help address heat issues.

### check_throttling_vcgencmd
#### Summary
This check uses `vcgencmd` to determine if throttling events are occurring now, or have already occurred. This check is limited to Raspberry Pi devices.

#### Triage
In order to triage, either reduce the load on the device or replace/reseat/upgrade any heatsinks that may be attached to
the CPU directly. Additionally, adding other cooling mechanisms like fans or improving the location of the device can
help address heat issues.

### check_os_rollback
#### Summary
This check confirms that the host OS has not noted any failed boots & rollbacks.
Expand Down
47 changes: 47 additions & 0 deletions scripts/checks.sh
Original file line number Diff line number Diff line change
Expand Up @@ -281,6 +281,51 @@ function check_temperature(){
fi
}

function check_throttling_dmesg(){
# see https://github.com/balena-io/device-diagnostics/issues/183
local -i TEMP_THROTTLING_COUNT
TEMP_THROTTLING_COUNT=$(dmesg | grep -cE 'Temperature above threshold, cpu clock throttled')
if (( TEMP_THROTTLING_COUNT > 0 )); then
log_status "${BAD}" "${FUNCNAME[0]}" "${TEMP_THROTTLING_COUNT} cpu throttling events detected, check CPU temperature"
else
log_status "${GOOD}" "${FUNCNAME[0]}" "No cpu throttling events detected"
fi

}

function check_throttling_vcgencmd(){
# Limited to Raspberry Pi 4 until https://github.com/balena-os/balena-raspberrypi/issues/485 resolved
local SLUG_WHITELIST=('raspberrypi4-64')
if is_valid_check WHITELIST "${SLUG_WHITELIST[*]}"; then
local THROTTLE_MSG
local -i RAW_THROTTLE_OUTPUT
RAW_THROTTLE_OUTPUT=$(vcgencmd get_throttled | awk -F"=" '{print $2}')
# Reference: https://www.raspberrypi.org/documentation/raspbian/applications/vcgencmd.md
# Bit Meaning
# 0 Under-voltage detected
# 1 Arm frequency capped
# 2 Currently throttled
# 3 Soft temperature limit active
# 16 Under-voltage has occurred
# 17 Arm frequency capping has occurred
# 18 Throttling has occurred
# 19 Soft temperature limit has occurred
if (( RAW_THROTTLE_OUTPUT > 0 )); then
(( RAW_THROTTLE_OUTPUT & 0x2 )) && THROTTLE_MSG="${THROTTLE_MSG} ARM freq capped"
(( RAW_THROTTLE_OUTPUT & 0x4 )) && THROTTLE_MSG="${THROTTLE_MSG} Currently throttled"
(( RAW_THROTTLE_OUTPUT & 0x8 )) && THROTTLE_MSG="${THROTTLE_MSG} Soft temp limit active"
(( RAW_THROTTLE_OUTPUT & 0x20000 )) && THROTTLE_MSG="${THROTTLE_MSG} ARM freq capping has occurred"
(( RAW_THROTTLE_OUTPUT & 0x40000 )) && THROTTLE_MSG="${THROTTLE_MSG} Throttling has occured"
(( RAW_THROTTLE_OUTPUT & 0x80000 )) && THROTTLE_MSG="${THROTTLE_MSG} Soft temp limnit has occurred"
fi
if [[ -n $THROTTLE_MSG ]]; then
log_status "${BAD}" "${FUNCNAME[0]}" "Raspberry Pi throttling events detected: $THROTTLE_MSG"
else
log_status "${GOOD}" "${FUNCNAME[0]}" "No Raspberry Pi throttling events detected"
fi
fi
}

function check_balenaOS()
{
# test resinOS 1.x based on matches like the following:
Expand Down Expand Up @@ -521,6 +566,8 @@ function run_checks()
"$(check_under_voltage)" \
"$(check_memory)" \
"$(check_temperature)" \
"$(check_temperature_throttling_dmesg)" \
"$(check_temperature_throttling_vcgencmd)" \
"$(check_container_engine)" \
"$(check_supervisor)" \
"$(check_networking)" \
Expand Down

0 comments on commit d4bc573

Please sign in to comment.