Debugging: Indispensable rules for finding even the most elusive problems (2004) Hackernews Viewer

Debugging: Indispensable rules for finding even the most elusive problems (2004)

524 points by omkar-foss 13 January 2025 | 225 comments

Comments

hughdbrown 13 January 2025

In my experience, the most pernicious temptation is to take the buggy, non-working code you have now and to try to modify it with "fixes" until the code works. In my experience, you often cannot get broken code to become working code because there are too many possible changes to make. In my view, it is much easier to break working code than it is to fix broken code.

Suppose you have a complete chain of N Christmas lights and they do not work when turned on. The temptation is to go through all the lights and to substitute in a single working light until you identify the non-working light.

But suppose there are multiple non-working lights? You'll never find the error with this approach. Instead, you need to start with the minimal working approach -- possibly just a single light (if your Christmas lights work that way), adding more lights until you hit an error. In fact, the best case is if you have a broken string of lights and a similar but working string of lights! Then you can easily swap a test bulb out of the broken string and into the working chain until you find all the bad bulbs in the broken string.

Starting with a minimal working example is the best way to fix a bug I have found. And you will find you resist this because you believe that you are close and it is too time-consuming to start from scratch. In practice, it tends to be a real time-saver, not the opposite.

GuB-42 13 January 2025

Rule 0: Don't panic

Really, that's important. You need to think clearly, deadlines and angry customers are a distraction. That's also when having a good manager who can trust you is important, his job is to shield you from all that so that you can devote all of your attention to solving the problem.

nickjj 13 January 2025

For #4 (divide and conquer), I've found `git bisect` helps a lot. If you have a known good commit and one of dozens or hundreds of commits after that is bad, this can help you identify the bad commit / code in a few steps.

Here's a walk through on using it: https://nickjanetakis.com/blog/using-git-bisect-to-help-find...

I jumped into a pretty big unknown code base in a live consulting call and we found the problem pretty quickly using this method. Without that, the scope of where things could be broken was too big given the context (unfamiliar code base, multiple people working on it, only able to chat with 1 developer on the project, etc.).

qwertox 13 January 2025

Make sure you're editing the correct file on the correct machine.

heikkilevanto 13 January 2025

Some additional rules: - "It is your own fault". Always suspect your code changes before anything else. It can be a compiler bug or even a hardware error, but those are very rare. - "When you find a bug, go back hunt down its family and friends". Think where else the same kind of thing could have happened, and check those. - "Optimize for the user first, the maintenance programmer second, and last if at all for the computer".

david_draco 13 January 2025

Step 10, add the bug as a test to the CI to prevent regressions? Make sure the CI fails before the fix and works after the fix.

sitkack 13 January 2025

If folks want to instill this mindset in their kids, themselves or others I would recommend at least

The Martian by Andy Weir https://en.wikipedia.org/wiki/The_Martian_(Weir_novel)

https://en.wikipedia.org/wiki/Zen_and_the_Art_of_Motorcycle_...

https://en.wikipedia.org/wiki/The_Three-Body_Problem_(novel)

To Engineer Is Human - The Role of Failure in Successful Design By Henry Petroski https://pressbooks.bccampus.ca/engineeringinsociety/front-ma...

https://en.wikipedia.org/wiki/Surely_You%27re_Joking,_Mr._Fe...!

nox101 13 January 2025

> #1 Understand the system: Read the manual, read everything in depth, know the fundamentals, know the road map, understand your tools, and look up the details.

Maybe I'm mis-understand but "Read the manual, read everything in depth" sounds like. Oh, I have bug in my code, first read the entire manual of the library I'm using, all 700 pages, then read 7 books on the library details, now that a month or two has passed, go look at the bug.

I'd be curious if there's a single programmer that follows this advice.

knlb 13 January 2025

I wrote a fairly similar take on this a few years ago (without having read the original book mentioned here) -- https://explog.in/notes/debugging.html

Julia Evans also has a very nice zine on debugging: https://wizardzines.com/zines/debugging-guide/

Zolomon 13 January 2025

I have been bitten more than once thinking that my initial assumption was correct, diving deeper and deeper - only to realize I had to ascend and look outside of the rabbit hole to find the actual issue.

> Assumption is the mother of all screwups.

gnufx 13 January 2025

Then, after successful debugging your job isn't finished. The outline of "Three Questions About Each Bug You Find" <http://www.multicians.org/thvv/threeq.html> is:

1. Is this mistake somewhere else also?

2. What next bug is hidden behind this one?

3. What should I do to prevent bugs like this?

fn-mote 13 January 2025

The article is a 2024 "review" (really more of a very brief summary) of a 2002 book about debugging.

The list is fun for us to look at because it is so familiar. The enticement to read the book is the stories it contains. Plus the hope that it will make our juniors more capable of handling complex situations that require meticulous care...

The discussion on the article looks nice but the submitted title breaks the HN rule about numbering (IMO). It's a catchy take on the post anyway. I doubt I would have looked at a more mundane title.

astrobe_ 13 January 2025

Also sometimes: the bug is not in the code, its in the data.

A few times I looked for a bug like "something is not happening when it should" or "This is not the expected result", when the issue was with some config file, database records, or thing sent by a server.

For instance, particularly nasty are non-printable characters in text files that you don't see when you open the file.

"simulate the failure" is sometimes useful, actually. Ask yourself "how would I implement this behavior", maybe even do it.

Also: never reason on the absence of a specific log line. The logs can be wrong (bugged) too, sometimes. If you printf-debugging a problem around a conditional for instance, log both branches.

jwpapi 13 January 2025

I’m not sure that doesn’t sit well with me.

Rule 1 should be: Reproduce with most minimal setup.

99% you’ll already have found the bug.

1% for me was a font that couldn’t do a combination of letters in a row. life ft, just didn’t work and thats why it made mistakes in the PDF.

No way I could’ve ever known that if I wouldn’t have reproduced it down to the letter.

Just split code in half till you find what’s the exact part that goes wrong.

waynecochran 13 January 2025

I also think it is worthwhile stepping thru working code with a debugger. The actual control flow reveals what is actually happening and will tell you how to improve the code. It is also a great way to demystify how other's code runs.

condour75 13 January 2025

One good timesaver: debug in the easiest environment that you can reproduce the bug in. For instance, if it’s an issue with a website on an iPad, first see if you reproduce in chrome using the responsive tools in web developer. If that doesn’t work, see if it reproduces in desktop safari. Then the iPad simulator, and only then the real hardware. Saves a lot of frustration and time, and each step towards the actual hardware eliminates a whole category of bugs.

BWStearns 13 January 2025

> Check the plug

I just spent a whole day trying to figure out what was going on with a radio. Turns out I had tx/rx swapped. When I went to check tx/rx alignment I misread the documentation in the same way as the first. So, I would even add "try switching things anyways" to the list. If you have solid (but wrong) reasoning for why you did something then you won't see the error later even if it's right in front of you.

ianmcgowan 13 January 2025

I used to manage a team that supported an online banking platform and gave a copy of this book to each new team member. If nothing else, it helped create a shared vocabulary.

It's useful to get the poster and make sure everyone knows the rules.

https://debuggingrules.com/download-the-poster/

teleforce 13 January 2025

The tenth golden rule:

10) Enable frame pointers [1].

[1] The return of the frame pointers:

https://news.ycombinator.com/item?id=39731824

spawarotti 13 January 2025

Very good online course on debugging: Software Debugging on Udacity by Andreas Zeller

https://www.udacity.com/course/debugging--cs259

pcblues 13 January 2025

Over twenty five odd years, I have found the path to a general debugging prowess can best be achieved by doing it. I'd recommend taking the list/buying the book, using https://up-for-grabs.net to find bugs on github/bugzilla, etc. and doing the following:

1. set up the dev environment

2. fork/clone the code

3. create a new branch to make changes and tests

4. use the list to try to find the root cause

5. create a pull request if you think you have fixed the bug

And use Rule 0 from GuB-42: Don't panic

(edited for line breaks)

omkar-foss 13 January 2025

For folks who love to read books, here's an excerpt from the Debugging book's accompanying website (https://debuggingrules.com/):

"Dave was asked as the author of Debugging to create a list of 5 books he would recommend to fans, and came up with this.

https://shepherd.com/best-books/to-give-engineers-new-perspe..."

manhnt 14 January 2025

> Make it fail: Do it again, start at the beginning, stimulate the failure, don't simulate the failure, find the uncontrolled condition that makes it intermittent, record everything and find the signature of intermittent bugs

Unfortunately, I found many times this is actually the most difficult step. I've lost count of how many times our QA reported an intermittent bug in their env, only to never be able to reproduce it again in the lab. Until it hits 1 or 2 customer in the field, but then when we try to take a look at customer's env, it's gone and we don't know when it could come back again.

apples_oranges 13 January 2025

A good bug is the most fun thing about software development

analog31 13 January 2025

One I learned on Friday: Check your solder connections under a microscope before hacking the firmware.

sandbar 13 January 2025

Take the time to speed up my iteration cycles has always been incredibly valuable. It can be really painful because its not directly contributing to determining/fixing the bug (which could be exacerbated if there is external pressure), but its always been worth it. Of course, this only applies to instances where it takes ~4+ minutes to run a single 'experiment' (test, startup etc). I find when I do just try to push through with long running tests I'll often forget the exact variable I tweaked during the course of the run. Further, these tweaks can be very nuanced and require you to maintain a lot of the larger system in your head.

duxup 13 January 2025

I’m so bad at #1.

I know it is the best route, I do know the system (maybe I wrote it) and yet time and again I don’t take the time to read what I should… and I make assumptions in hopes of speeding up the process/ fix, and I cost myself time…

__MatrixMan__ 13 January 2025

> Check that it's really fixed, check that it's really your fix that fixed it, know that it never just goes away by itself

I wish this were true, and maybe it was in 2004, but when you've got noise coming in from the cloud provider and noise coming in from all of your vendors I think it's actually quite likely that you'll see a failure once and never again.

I know I've fixed things for people without without asking if they ever noticed it was broken, and I'm sure people are doing that to me also.

goshx 13 January 2025

> Quit thinking and look (get data first, don't just do complicated repairs based on guessing)

From my experience, this is the single most important part of the process. Once you keep in mind that nothing paranormal ever happens in systems and everything has an explanation, it is your job to find the reason for things, not guess them.

I tell my team: just put your brain aside and start following the flow of events checking the data and eventually you will find where things mismatch.

kazinator 13 January 2025

I've had trouble keeping the audit trail. It can distract from the flow of debugging, and there can be lots of details to it, many of which end up being irrelevant; i.e. all the blind rabbit holes that were not on the maze path to the bug. Unless you're a consultant who needs to account for the hours, or a teller of engaging debugging war stories, the red herrings and blind alleys are not that useful later.

Tepix 13 January 2025

My first rule for debugging debutants:

Don't be too embarassed to scatter debug logmessages in the code. It helps.

My second rule:

Don't forget to remove them when you're done.

_madmax_ 14 January 2025

I had the incredible luck to stumble upon this book early in my career and it helped me tremendously in so many ways. If I could name only one it would be that it helped me get over the sentiment of being helpless in front of a difficult situation. This book brought me to peace with imperfection and me being an artisan of imperfection.

samsquire 13 January 2025

One thing I have been doing is to create a directory called "debug" from the software and write lots of different files when the main program has executed to add debugging information but only write files outside of hot loops for debugging and then visually inspect the logs when the program is exited.

For intermediate representations this is better than printf to stdout

shahzaibmushtaq 13 January 2025

I can't comment further on David A. Wheeler's review because his words were from 2004 (He said everything true), and I can't comment on the book either because I haven't read it yet.

Thank you for introducing me to this book.

One of my favorite rules of debugging is to read the code in plain language. If the words don't make sense somewhere, you have found the problem or part of it.

andypi_swfc 13 January 2025

I found this book so helpful I created a worksheet based on it - might be helpful for some: https://andypi.co.uk/2024/01/26/concise-guide-to-debugging-a...

ChrisMarshallNY 13 January 2025

#7 Check the plug: Question your assumptions, start at the beginning, and test the tool.

I have found that 90% of network problems, are bad cables.

That's not an exaggeration. Most IT folks I know, throw out ethernet cables immediately. They don't bother testing them. They just toss 'em in the trash, and break a new one out of the package.

BlueUmarell 13 January 2025

Post: "9 rules of debugging"

Each comment: "..and this is my 10th rule: <insert witty rule>"

Total number of rules when reaching the end of the post: 9 + n + n * m, with n being number of users commenting, m being the number of users not posting but still mentally commenting on the other users' comments.

TheLockranore 13 January 2025

Rule 11: If you haven't solved it and reach this rule, one of your assertions is incorrect. Start over.

reverendsteveii 13 January 2025

Review was good enough to make me snag the entire book. I'm taking a break from algorithmic content for a bit and this will help. Besides, I've got an OOM bug at work and it will be fun to formalize the steps of troubleshooting it. Thanks, OP!

fasten 14 January 2025

Nice classic that sticks to timeless pricniples. the nine rules are practical with war stories that make them stick. but agree that "don't panic" should be added

jgrahamc 13 January 2025

Wasn't Bryan Cantrill writing a book about debugging? I'd love to read that.

dalton_zk 14 January 2025

First time hearing about these 9 rules, but I learning most of them by experience with many years resolving or trying to resolved bugs.

Only thing that I dont agree is the book cost US$ 4.291,04 on Amazon

urbandw311er 13 January 2025

Rule #10 - it’s probably DNS

PhunkyPhil 13 January 2025

I would almost change 4 into "Binary search".

Wheeler gets close to it by suggesting to locate which side of the bug you're on, but often I find myself doing this recursively until I locate it.

berikv 13 January 2025

Personally, I’d start with divide and conquer. If you’re working on a relevant code base chances are that you can’t learn all the API spec and documentation because it’s just too much.

mootoday 13 January 2025

Did anyone say debugging?

I've followed https://debugbetter.com/ for a few weeks and the content has been great!

jagged-chisel 14 January 2025

> Ask for fresh insights (just explaining the problem to a mannequin may help!)

You can’t trust a thing this person says if they’re not recommending a duck.

k3vinw 13 January 2025

The unspoken rule is talking to the rubber duck :)

gregthelaw 13 January 2025

I love the "if you didn't fix it, it ain't fixed". It's too easy to convince yourself something is fixed when you haven't fully root-caused it. If you don't understand exactly how the thing your seeing manifested, papering over the cracks will only cause more pain later on.

As someone who has been working on a debugging tool (https://undo.io) for close to two decades now, I totally agree that it's just weird how little attention debugging as a whole gets. I'm somewhat encouraged to see this topic staying near the top of hacker news for as long as it has.

fedeb95 13 January 2025

rule -1: don't trust the bug issuer

begueradj 13 January 2025

This is related to the classic debugging book with the same title. I first discovered it here in HN.

__mharrison__ 13 January 2025

Go on a walk or take a shower...

nottorp 13 January 2025

I’d add “a logging module done today will save you a lot of overtime next year”.

ChrisArchitect 13 January 2025

(2004)

Title is: David A. Wheeler's Review of Debugging by David J. Agans

worldhistory 13 January 2025

Great book +1

coldtea 14 January 2025

>Rule 1: Understand the system: Read the manual, read everything in depth (...)

Yeah, ain't nobody got time for that. If e.g. debugging a compile issue meant we read the compiler manual, we'd get nothing done...

01308106991 14 January 2025

Halo