Linux CPU Usage Bug: The Mystery of the Perpetual 100% Utilization

· 1 min read

article picture

A mysterious issue has been plaguing Linux systems running on certain ARM processors - the CPU usage appears stuck at 100% even when the system is idle. This perplexing problem stems from how Linux monitors and reports CPU utilization.

The issue manifests in system monitoring tools like 'top' showing the processor running at full capacity despite minimal actual workload. Investigation revealed the root cause lies in how the system tracks CPU idle time through the /proc/stat interface.

On affected systems, the idle time counter fails to increment properly when using certain timer configurations. This causes monitoring tools to incorrectly calculate high CPU usage, since they determine utilization by comparing the changes in various CPU state counters between measurement intervals.

The technical culprit was traced to the code handling timer register reads on specific ARM platforms like the PXA168. The implementation didn't properly account for hardware timing requirements when capturing counter values, leading to stale or invalid readings.

The fix involved modifying how the timer registers are read to ensure stable values. Instead of a simple delay loop, multiple register reads are now performed to properly capture the current counter state. This change allows the system to accurately track idle time.

This bug impacted multiple ARM platforms over many kernel versions before being identified and fixed. The resolution is now available in Linux kernel version 6.2 and has been backported to various stable kernel releases.

For users experiencing this issue, upgrading to a patched kernel version will resolve the incorrect CPU utilization reporting. The fix allows system monitoring tools to properly detect and display idle CPU time rather than showing artificial 100% usage.

This case highlights how seemingly simple issues can have complex underlying causes requiring deep investigation across multiple software layers - from userspace tools down to hardware-specific kernel code.