It doesn't feel like reading 4 times is necessarily a portable solution, if there will be more versions at different speeds and different I/O architectures; or how this will work under more load, and whether the original change was done to fix some other performance problem OP is not aware of, but not sure what else can be done. Unfortunately many vendors like Marvell can seriously under-document crucial features like this. If anything it would be good to put some of this info in the comment itself, not very elegant but how else practically are we meant to keep track of this, is the mailing list part of the documentation?
Doesn't look like there's a lot of discussion on the mailing list, but I don't know if I'm reading the thread view correctly.
In the late 1990's I worked in a company that had a couple mainframes in their fleet and once I looked into a resource usage screen (Omegamon, perhaps? Is it that old?) and noticed the CPU was pegged at 100%. I asked the operator if that was normal. His answer was "Of course. We paid for that CPU, might as well use it". Funny though that mainframes are designed for that - most, if not all, non-application work is offloaded to other processors in the system so that the CPU can run applications as fast as it can.
This is a wonderful write-up and a very enjoyable read. Although my knowledge about systems programming on ARM is limited, I know that it isn't easy to read hardware-based time counters; at the very least, it's not as simple as the x86 rdtsc [1]. This is probably why the author writes:
> This code is more complicated than what I expected to see. I was thinking it would just be a simple register read. Instead, it has to write a 1 to the register, and then delay for a while, and then read back the same register. There was also a very noticeable FIXME in the comment for the function, which definitely raised a red flag in my mind.
Regardless, this was a very nice read and I'm glad they got down to the issue and the problem fixed.
Curiously, instead of "set capture reg, wait for clock edge, read", the "read reg twice, until same result is obtained" approach is ignored. This is strange as it is usually much faster - reading a 3.25MHz counter at 200MHz+ twice is very likely to see the same value twice. For a 32KHz counter, it is basically guaranteed.
u32 val;
do {
val = readl(...);
} while (val != readl(...));
return val;
compiles to a nice 6-instr little function on arm/thumb too, with no delays
My recurring issue (on a variety of laptops, both Linux and Windows): the fans will start going full-blast, everything slows down, then as soon as I open a task manager CPU usage drops from 100% to something negligible.
Aside from the technical beauty of this post, what is the practical impact of this?
Fan speeds should ideally be looking at temperature sensors, CPU idling is working albeit with interrupt waits as pointed out here. The only impact seems to be surprise that the CPU is working harder than it really is when looking at this number.
It's far better to look at the system load (which was 0.0 - already a strong hint this system is working below capacity). It has a formal definition (average waiting cpu task queue depth over 1, 5, 10 minutes) and succinctly captures the concept of "this machine is under load".
Many years ago, a coworker deployed a bad auditd config. CPU usage was below 10%, but system load was 20x the number of cores. We moved all our alerts to system load and used that instead.
Is it just because reading takes time, therefore reading multiple time makes the needed time from writing to reading passes? If so, it sounds like a worse solution than just extending waiting delay longer like the author did initially.
This headline reminded me of Mumptris, an implementation of Tetris in the old mainframe-oriented language MUMPS, which by design, uses 100% CPU to reduce latency: https://news.ycombinator.com/item?id=4085593
I noticed that one time. Looked at the process list, and what was running was a program that enabled streaming. But since I wasn't streaming anything, I wondered what it was doing reading the disk drive.
So I uninstalled it.
Not having any programs that are not good citizens.
Great read! Eerily similar to some bugs I've had, but the root cause has been a compiler bug. Debugging a kernel that doesn't boot is... interesting. QEMU+GDB to the rescue.
Why is my CPU usage always 100%?
(downtowndougbrown.com)537 points by pncnmnp 9 January 2025 | 142 comments
Comments
When I finally got round to see what he was doing I was disappointed to find he was attempting to kill the 'system idle' process.
Doesn't look like there's a lot of discussion on the mailing list, but I don't know if I'm reading the thread view correctly.
> This code is more complicated than what I expected to see. I was thinking it would just be a simple register read. Instead, it has to write a 1 to the register, and then delay for a while, and then read back the same register. There was also a very noticeable FIXME in the comment for the function, which definitely raised a red flag in my mind.
Regardless, this was a very nice read and I'm glad they got down to the issue and the problem fixed.
[1]: https://www.felixcloutier.com/x86/rdtsc.
Fan speeds should ideally be looking at temperature sensors, CPU idling is working albeit with interrupt waits as pointed out here. The only impact seems to be surprise that the CPU is working harder than it really is when looking at this number.
It's far better to look at the system load (which was 0.0 - already a strong hint this system is working below capacity). It has a formal definition (average waiting cpu task queue depth over 1, 5, 10 minutes) and succinctly captures the concept of "this machine is under load".
Many years ago, a coworker deployed a bad auditd config. CPU usage was below 10%, but system load was 20x the number of cores. We moved all our alerts to system load and used that instead.
Why reading it multiple times will fix the issue?
Is it just because reading takes time, therefore reading multiple time makes the needed time from writing to reading passes? If so, it sounds like a worse solution than just extending waiting delay longer like the author did initially.
If not, then I would like to know the reason.
(Needless to say, a great article!)
So I uninstalled it.
Not having any programs that are not good citizens.
shame about the unnecessary use of cat :)
> I opted to use 4 as a middle ground
reminded me of xkcd: Standards
https://xkcd.com/927/