Oh, I have a relevant and surprisingly simple example: Binary search. Binary search and its variants leftmost and rightmost binary search are surprisingly hard to code correctly if you don't think about the problem in terms of loop invariants. I outlined the loop invariant approach in [1] with some example Python code that was about as clear and close to plain English at I could get.
Jon Bentley, the writer of Programming Pearls, gave the task of writing an ordinary binary search to a group of professional programmers at IBM, and found that 90% of their implementations contained bugs. The most common one seems to have been accidentally running into an infinite loop. To be fair, this was back in the day when integer overflows had to be explicitly guarded against - but even then, it's a surprising number.
I think I stumbled across a similar concept in the more difficult post-grad classes I ended up in a long time ago. I began at some point late in my undergrad doing math tests entirely in pen. I didn't understand why, but it resulted in higher scores almost always, and much neater scratchwork, which I had attributed to the reason, but I think what was helping was something along the lines of what this post is getting at.
What was helping me was that before I wrote a single expression, I thought about it carefully in my head and where it would lead before putting pen to paper, because I didn't want to make a bunch of messy scratch out marks on it. Or, sometimes, I'd use a healthy amount of throwaway scratch paper if allowed. Once my path was fully formed in my head I'd begin writing, and it resulted in far fewer mistakes.
I don't always take this approach to writing code but often I do formulate a pretty clear picture in my head of how it is going to look and how I know it will work before I start.
I learned the foundations of theoretical computer science in university. I agree with the sentiment of this post, though it is hard to perform in practice.
To be a better programmer, write little proofs in your code.
We call that tests and types, proof that it should do what you expect.
Especially when writing tests first, then types, then the code.
Start with a test per acceptance criteria, that well describes what it should do, and is clear what you send and receive.
Also in an API you can describe the API in OpenAPI or GraphQL with all the properties and types, and you can validate on runtime the data on your specification, that specification is than also a proof that the application does what you described in your specification.
So OpenAPI/GraphQL, tests and types and proof that the system works like intended. Always start with that before writing code.
The specification is a solid base that doesn't change a lot if you start with it.
How the code works, you can refactor it, and proof with the specification if it sill does the same like before.
> My thesis so far is something like "you should try to write little proofs in your head about your code." But there's actually a secret dual version of this post, which says "you should try to write your code in a form that's easy to write little proofs about."
Easier said than done. It is certainly feasible on greenfield projects where all the code is written by you (recently), and you have a complete mental model of the data layout and code dependencies. It's much harder to prove stuff this way when you call foo(), bar() and baz() across unit boundaries, when they modify global state and are written by different developers.
The idea that you should design programs with proof in mind goes back to T. J. Dekker's solution to the mutual exclusion problem in 1959. The story is told by Edgser Dijkstra in EWD1303 (https://www.cs.utexas.edu/~EWD/transcriptions/EWD13xx/EWD130...). Much of Dijkstra's later work can be seen as him working out the consequences of this insight.
Writing correct proofs is hard. Program verification is hard. In my opinion if you are hand weaving it there’s no benefit. Thinking about invariants and pre-post conditions is often unnecessary or greatly reduced if you write idiomatic code for the language and codebase. Check out “The Practice of Programming” by R. Pike and B. W. Kernighan. The motto is: simplicity, clarity, generality. I find it works really well in a day job.
On a slightly related note… Competitive programming is surprisingly good at teaching you the right set of techniques to make sure your code is correct. You won’t progress beyond certain stage unless you pick them up.
As an undergrad at Carnegie Mellon in the 80s, I was explicitly taught to do all of these things in one of the first-year programming courses. And it has served me very well since then.
I especially remember how learning the equivalence of recursion and induction immediately eliminated the “frustrated trial and error” approach for making recursive algorithms that work.
I'd also add mutability and immutability to the list of properties. Keeping as much state as possible immutable doesn't just help with multithreading, it will also greatly reduce the headache when trying to understand the possible states of a program.
Totally tangential, but I love to read post-mortems of people fixing bugs. What were the initial symptoms? What was your first theory? How did you test it? What was the resolution? Raymond Chen does this a lot and I've always enjoyed it.
I learn more from these concrete case studies than from general principles (though I agree those are important too).
One of my most recent bugs was a crash bug in a thread-pool that used garbage-collected objects (this is in C++) to manage network connections. Sometimes, during marking, one of the objects I was trying to mark would be already freed, and we crashed.
My first thought was that this was a concurrency problem. We're supposed to stop all threads (stop the world) during marking, but what if a thread was not stopped? Or what if we got an event on an IO completion port somehow during marking?
I was sure that it had to be a concurrency problem because (a) it was intermittent and (b) it frequently happened after a connection was closed. Maybe an object was getting deleted during marking?
The only thing that was weird was that the bug didn't happen under stress (I tried stress testing the system, but couldn't reproduce it). In fact, it seemed to happen most often at startup, when there weren't too many connections or threads running.
Eventually I proved to myself that all threads were paused properly during marking. And with sufficient logging, I proved that an object was not being deleted during marking, but the marking thread crashed anyway.
[Quick aside: I tried to get ChatGPT to help--o3 pro--and it kept on suggesting a concurrency problem. I could never really convince it that all threads were stopped. It always insisted on adding a lock around marking to protect it against other threads.]
The one thing I didn't consider was that maybe an object was not getting marked properly and was getting deleted even though it was still in use. I didn't consider it because the crash was in the marking code! Clearly we're marking objects!
But of course, that was the bug. Looking at the code I saw that an object was marked by a connection but not by the listener it belonged to. That meant that, as long as there was a connection active, everything worked fine. But if ever there were no connections active, and if we waited a sufficient amount of time, the object would get deleted by GC because the listener had forgotten to mark it.
Then a connection would use this stale object and on the next marking pass, it would crash.
This should be obvious, right? This is how you are supposed to write code. You are supposed to have a simple proposition about what each chunk of your code does.
The best way I have found to integrate this approach is Test Driven Development.
When done well, every test you write before you see it fail and then you write the barest amount of code that you think will make it pass is a mini-proof. Your test setup and assertions are what cover your pre/post conditions. Base cases are the invariant.
The key here is to be disciplined, write the simplest test you can, see the test fail before writing code, write the smallest amount of code possible to make the test pass. Repeat.
The next level is how cohesive or tightly coupled your tests are. Being able to make changes with minimal test breakage "blast radius" increases my confidence of a design.
This is similar to an intuition I've had about what it means to program "algorithmically". We often draw a distinction between "algorithms" and "business logic", but fundamentally they are the same thing. They are both plans of the steps necessary to accomplish a goal. The only difference, in my mind, is the style in which the program is written. To program "algorithmically" means to take steps to make undesirable program states impossible to represent.
- In the case of search or sort algorithms, where the primary concern is the speed of computation, undesirable states would be performing unnecessary or duplicate computations.
- In the case of encryption algorithms, undesirable states would be those that leak encrypted data.
- In the case of an order shipping and fulfillment system, an undesirable state would be marking an order as fulfilled when not all of the items have been delivered.
The more care that is taken to prevent undesirable states, the more the program takes on an algorithmic style. And the only way you can be sure that those undesirable states are impossible is to think in terms of proofs and invariants.
I really identify with this way of thinking. One domain where it is especially helpful is writing concurrent code. For example, if you have a data structure that uses a mutex, what are the invariants that you are preserving across the critical sections? Or when you're writing a lock-free algorithm, where a proof seems nearly required to have any hope of being correct.
I feel like this is reaching for the vocabulary that you get in mathematics of axioms, postulates, and theorems. Not in a bad way, mind. Just the general idea of building a separate vocabulary for the different aspects of a system of knowledge.
I also think things get complicated with holding things constant as a necessity. Often times, the best way to find a bug is to ask not "how was this possible?" but "why would another part of the system modify this data?"
Obviously, if you can make things reference capable data, you should do so. But, all too often "this data being X" has a somewhat self evident explanation that can help. And just "copy so I don't have to worry about it" lands you squarely in "why is my copy out of date?"
What's interesting about this view is that theorem provers ultimately boil down to sufficiently advanced type checking (the book Type Theory and Formal Proof is an excellent overview of this topic). Literally the heart of proof assistants like Lean is "if it compiles, you've proved it".
So more practical type systems can be viewed as "little" theorem provers in the sense the author is describing. "Thinking in types" so to speak, is mathematically equivalent to "writing little proofs in your head". I would also add that merely using types is not equivalent to "thinking in types" as one would be required to do writing a sufficiently advanced Haskell program.
One thing that I feel that is discussed less is, what are the high level constraints and levers that can be set to enable a group of programmers to become a better programmers. For example, how much impact does architecture have? bug tracing, choice of language, tools. People handwave and say that one is better than another because of <reason 1>, <reason 2>, but to be able to explain how each of this choice impacts the code in a semi-rigorous way by explaining all the way of how it impacts individual code components and their interaction (as a graph or a tree) would be immensely helpful. If anyone knows of resources like that, please let us know
This might be the best article that I have read this year. It might change the way that I program. I really want to write up examples for each of the ideas presented: Monotonicity, Pre- and post-conditions, Invariants, Isolation, Induction, and Proof-affinity as a quality metric. I think it's easy to find these ideas in my code, but I want to try to practice writing proofs "in my head" as I code to make my code better. It is not apparent to me now how this will work, but if I write up some examples, I imagine it will become clearer.
Maybe a related practice (what I do): Write actual proofs or good documentation that contains essentially proofs and combine that with Knuth's Literate programming.
For what is to be proved, try to use divide and conquor to make the whole proof consist of separate steps, each of which gets a short proof and each of those steps might become a subroutine (whatever the terminology and functionality of the programming language and environment).
Why not include little proofs in the code when you can?
Several programming languages and libraries support Design-by-Contract (https://en.wikipedia.org/wiki/Design_by_contract) which lets you specify preconditions, postconditions, and invariants directly in your code.
Those predicates can be checked in various ways (depending on how deeply Design-by-Contract is supported) to help you know that your code is working correctly.
While having been employed as a developer for almost 7 years now, I remember taking this class maybe discrete math, I was not a fan of that class, where you have problems like p -> q
Also farthest in math/physics I got was intro to quantum mechanics which the multiple-pages to solve a problem lost me
Being a good programmer... I don't have a degree so I've never really tried to get into FAANG. I also am aware after trying Leetcode, I'm not an algorithm person.
What's funny is at my current job which it's a multi-national huge entity thing but I have to try and push people to do code reviews or fix small errors that make something look bad (like a button being shorter than an input next to it).
I am self-aware with true skill, I can make things, but I don't think I'd ever be a John Carmack. If you follow a framework's pattern are you a good developer? I can see tangible metrics like performance/some slow thing, someone better makes it faster.
Recently someone forked a repo of a hardware project I made. It's fun watching them change it, to understand what I wrote and then change it to fit their needs.
When I see my old code I do recognize how it was verbose/could be much simpler.
Totally agree — binary search is the classic example of something that feels simple until you try to implement it robustly. The off-by-one errors, incorrect mid-point updates, and forgetting to handle edge cases (like low == high) are so easy to miss.
Bentley’s anecdote is a perfect illustration of how writing "correct" code often depends more on reasoning rigor than language syntax. Loop invariants, like you mentioned, are key here — they help anchor the logic through each iteration. It’s one of those patterns that separates surface-level understanding from deep intuition.
I also like how languages like Rust (with its strong type system) or Python (with clarity) can highlight these logic gaps early, especially in interviews or algorithmic codebases.
If you’ve got the link to your example post handy, I’d love to give it a read.
This reminds me of reading Accelerated C++ back in the day and I think the part that stuck with me most from that book was the idea of "holding the invariant in your head" (I'm paraphrasing a bit).
It made me think a lot more about every line of code I wrote and definitely helped me become a better programmer.
I don't do this often, but when I do, it's almost always when writing non-trivial concurrent code. I'll often "fuzz" the scheduling of multiple tasks around the region of code I'm working on to prove to myself that it works.
They taught me in school to think about preconditions, invariant and postconditions for loops. Then I switched to only using functional affordances like .map() and .each(). I am much happier now.
Instead of just thinking about the proof in your mind or putting it in comments you can actually embed pre/post conditions and compile time proofs in actual code using ADA SPARK. You use Pre and Post aspects for subprograms and pragma Invariant for loops and such constructs. The spark toolchain will run provers against your code and show if it can show the proofs hold.
I often run through the code in my head first, especially for things like binary search where it's easy to mess up the logic. It feels like a quick mental check, sometimes even faster than writing tests.
I'm curious though. When you're coding, do you actually pause and think through the logic first, or is it more like writing and fixing along the way?
This is something I’ve practiced myself quite a lot in recent years, and in my experience it’s so much harder while at your desk. Doing something else while letting your brain work really helps. For me, that’s usually going for a walk, taking a run in the woods, or doing some repeditive household chore.
correlated ideas, that I thought were worth sharing:
- formal properties help one understand their own code better. This holds even more true when the properties are high level.
- they are useful to make sure the code behaves as expected, but not only. During the development, it notably helps finding better implementations and reviewing incremental changes.
- the process of outlining properties and drafting their proofs is already valuable, even without formally doing the proofs. It often prompts ideas about more properties and potential bugs.
- a great code is one that makes its rules and invariants obvious. This usually correlates well with general readability.
I literally don't even write proofs in my head. Every time I write algo I just ask Cursor for a Lean/Rocq proof of correctness. No tests. No test coverage. The sheer power of type theory
I’m too old for that, so I write the algorithm on paper and run it with some numbers, also on paper. I don’t have the patience to do it often but whenever I did I usually got it done in one or two tries.
I wonder how many of these rules can be incorporated into a linter or be evaluated by an LLM in an agentic feedback loop. It would be nice to encourage code to be more like this.
I notice you didn't really talk much about types. When I think of proofs in code my mind goes straight to types because they literally are proofs. Richer typed languages move even more in that direction.
One caveat I would add is it is not always desirable to be forced to think through every little scenario and detail. Sometimes it's better to ship faster than write 100% sound code.
And as an industry, as much as I hate it, the preference is for languages and code that is extremely imperative and brittle. Very few companies want to be writing code in Idris or Haskell on a daily basis.
Let me elaborate: there is a huge history of dubious allegations of what constitutes "a real programmer" or stuff that makes you "a better programmer".
Combobulation and descombobulation of McGuffins is often the best analogy though. Not to dismerit other kinds of thinkings, but already dismeriting them, that's what this is all about. When in doubt, showing the code is often what works.
To be a better programmer, write little proofs in your head
(the-nerve-blog.ghost.io)461 points by mprast 15 July 2025 | 167 comments
Comments
Jon Bentley, the writer of Programming Pearls, gave the task of writing an ordinary binary search to a group of professional programmers at IBM, and found that 90% of their implementations contained bugs. The most common one seems to have been accidentally running into an infinite loop. To be fair, this was back in the day when integer overflows had to be explicitly guarded against - but even then, it's a surprising number.
[1]: https://hiandrewquinn.github.io/til-site/posts/binary-search...
What was helping me was that before I wrote a single expression, I thought about it carefully in my head and where it would lead before putting pen to paper, because I didn't want to make a bunch of messy scratch out marks on it. Or, sometimes, I'd use a healthy amount of throwaway scratch paper if allowed. Once my path was fully formed in my head I'd begin writing, and it resulted in far fewer mistakes.
I don't always take this approach to writing code but often I do formulate a pretty clear picture in my head of how it is going to look and how I know it will work before I start.
In addition to pre-conditions and post-conditions, I would like to emphasize that loop invariants and structural induction are powerful techniques in CS proofs. https://en.wikipedia.org/wiki/Loop_invariant , https://en.wikipedia.org/wiki/Structural_induction
These notes from UofT's CSC236H are on-topic: https://www.cs.toronto.edu/~david/course-notes/csc236.pdf#pa...
So OpenAPI/GraphQL, tests and types and proof that the system works like intended. Always start with that before writing code. The specification is a solid base that doesn't change a lot if you start with it. How the code works, you can refactor it, and proof with the specification if it sill does the same like before.
Code is less important than the specifications.
Easier said than done. It is certainly feasible on greenfield projects where all the code is written by you (recently), and you have a complete mental model of the data layout and code dependencies. It's much harder to prove stuff this way when you call foo(), bar() and baz() across unit boundaries, when they modify global state and are written by different developers.
On a slightly related note… Competitive programming is surprisingly good at teaching you the right set of techniques to make sure your code is correct. You won’t progress beyond certain stage unless you pick them up.
I especially remember how learning the equivalence of recursion and induction immediately eliminated the “frustrated trial and error” approach for making recursive algorithms that work.
I learn more from these concrete case studies than from general principles (though I agree those are important too).
One of my most recent bugs was a crash bug in a thread-pool that used garbage-collected objects (this is in C++) to manage network connections. Sometimes, during marking, one of the objects I was trying to mark would be already freed, and we crashed.
My first thought was that this was a concurrency problem. We're supposed to stop all threads (stop the world) during marking, but what if a thread was not stopped? Or what if we got an event on an IO completion port somehow during marking?
I was sure that it had to be a concurrency problem because (a) it was intermittent and (b) it frequently happened after a connection was closed. Maybe an object was getting deleted during marking?
The only thing that was weird was that the bug didn't happen under stress (I tried stress testing the system, but couldn't reproduce it). In fact, it seemed to happen most often at startup, when there weren't too many connections or threads running.
Eventually I proved to myself that all threads were paused properly during marking. And with sufficient logging, I proved that an object was not being deleted during marking, but the marking thread crashed anyway.
[Quick aside: I tried to get ChatGPT to help--o3 pro--and it kept on suggesting a concurrency problem. I could never really convince it that all threads were stopped. It always insisted on adding a lock around marking to protect it against other threads.]
The one thing I didn't consider was that maybe an object was not getting marked properly and was getting deleted even though it was still in use. I didn't consider it because the crash was in the marking code! Clearly we're marking objects!
But of course, that was the bug. Looking at the code I saw that an object was marked by a connection but not by the listener it belonged to. That meant that, as long as there was a connection active, everything worked fine. But if ever there were no connections active, and if we waited a sufficient amount of time, the object would get deleted by GC because the listener had forgotten to mark it.
Then a connection would use this stale object and on the next marking pass, it would crash.
function simplifyTree(root: Node): Node {
This should be obvious, right? This is how you are supposed to write code. You are supposed to have a simple proposition about what each chunk of your code does.
When done well, every test you write before you see it fail and then you write the barest amount of code that you think will make it pass is a mini-proof. Your test setup and assertions are what cover your pre/post conditions. Base cases are the invariant.
The key here is to be disciplined, write the simplest test you can, see the test fail before writing code, write the smallest amount of code possible to make the test pass. Repeat.
The next level is how cohesive or tightly coupled your tests are. Being able to make changes with minimal test breakage "blast radius" increases my confidence of a design.
- In the case of search or sort algorithms, where the primary concern is the speed of computation, undesirable states would be performing unnecessary or duplicate computations.
- In the case of encryption algorithms, undesirable states would be those that leak encrypted data.
- In the case of an order shipping and fulfillment system, an undesirable state would be marking an order as fulfilled when not all of the items have been delivered.
The more care that is taken to prevent undesirable states, the more the program takes on an algorithmic style. And the only way you can be sure that those undesirable states are impossible is to think in terms of proofs and invariants.
I also think things get complicated with holding things constant as a necessity. Often times, the best way to find a bug is to ask not "how was this possible?" but "why would another part of the system modify this data?"
Obviously, if you can make things reference capable data, you should do so. But, all too often "this data being X" has a somewhat self evident explanation that can help. And just "copy so I don't have to worry about it" lands you squarely in "why is my copy out of date?"
So more practical type systems can be viewed as "little" theorem provers in the sense the author is describing. "Thinking in types" so to speak, is mathematically equivalent to "writing little proofs in your head". I would also add that merely using types is not equivalent to "thinking in types" as one would be required to do writing a sufficiently advanced Haskell program.
For what is to be proved, try to use divide and conquor to make the whole proof consist of separate steps, each of which gets a short proof and each of those steps might become a subroutine (whatever the terminology and functionality of the programming language and environment).
Several programming languages and libraries support Design-by-Contract (https://en.wikipedia.org/wiki/Design_by_contract) which lets you specify preconditions, postconditions, and invariants directly in your code.
Those predicates can be checked in various ways (depending on how deeply Design-by-Contract is supported) to help you know that your code is working correctly.
Ada supports Design-by-Contract as part of the language: https://learn.adacore.com/courses/intro-to-ada/chapters/cont...
SPARK extends Ada Design-by-Contract into full proofs: https://learn.adacore.com/courses/intro-to-spark/index.html
Rust has the Contracts crate: https://docs.rs/contracts/latest/contracts/
Other programming languages have various levels of support or libraries for Design-by-Contract: https://en.wikipedia.org/wiki/Design_by_contract#Language_su...
Also farthest in math/physics I got was intro to quantum mechanics which the multiple-pages to solve a problem lost me
Being a good programmer... I don't have a degree so I've never really tried to get into FAANG. I also am aware after trying Leetcode, I'm not an algorithm person.
What's funny is at my current job which it's a multi-national huge entity thing but I have to try and push people to do code reviews or fix small errors that make something look bad (like a button being shorter than an input next to it).
I am self-aware with true skill, I can make things, but I don't think I'd ever be a John Carmack. If you follow a framework's pattern are you a good developer? I can see tangible metrics like performance/some slow thing, someone better makes it faster.
Recently someone forked a repo of a hardware project I made. It's fun watching them change it, to understand what I wrote and then change it to fit their needs.
When I see my old code I do recognize how it was verbose/could be much simpler.
Bentley’s anecdote is a perfect illustration of how writing "correct" code often depends more on reasoning rigor than language syntax. Loop invariants, like you mentioned, are key here — they help anchor the logic through each iteration. It’s one of those patterns that separates surface-level understanding from deep intuition.
I also like how languages like Rust (with its strong type system) or Python (with clarity) can highlight these logic gaps early, especially in interviews or algorithmic codebases.
If you’ve got the link to your example post handy, I’d love to give it a read.
It made me think a lot more about every line of code I wrote and definitely helped me become a better programmer.
So many communication issues on teams occur when people are using the same words to mean different things.
https://www.youtube.com/watch?v=tYAod_61ZuQ
I'm curious though. When you're coding, do you actually pause and think through the logic first, or is it more like writing and fixing along the way?
For me, the best "trick" is to start with pseudocode.
The Isolation part is understandable and supports the philosophy:
> When requirements change, you extend the behavior of [your program] by adding new code, not by changing old code that already works.
- formal properties help one understand their own code better. This holds even more true when the properties are high level.
- they are useful to make sure the code behaves as expected, but not only. During the development, it notably helps finding better implementations and reviewing incremental changes.
- the process of outlining properties and drafting their proofs is already valuable, even without formally doing the proofs. It often prompts ideas about more properties and potential bugs.
- a great code is one that makes its rules and invariants obvious. This usually correlates well with general readability.
I notice you didn't really talk much about types. When I think of proofs in code my mind goes straight to types because they literally are proofs. Richer typed languages move even more in that direction.
One caveat I would add is it is not always desirable to be forced to think through every little scenario and detail. Sometimes it's better to ship faster than write 100% sound code.
And as an industry, as much as I hate it, the preference is for languages and code that is extremely imperative and brittle. Very few companies want to be writing code in Idris or Haskell on a daily basis.
Could anyone, please, explain me the meaning ? I can't get it.
How to Think About Algorithms by Jeff Edmonds.
Persuasive Programming by Jerud Mead and Anil Shinde.
That's all there is.
You want to get better at something? Do it all day every day.
// x == 0 && (y == 0 || y == x + 1) && a[x + y] != 0
... code here ...
// x == 0 && (y == x + 1) && ...
https://xkcd.com/378/
---
Let me elaborate: there is a huge history of dubious allegations of what constitutes "a real programmer" or stuff that makes you "a better programmer".
Combobulation and descombobulation of McGuffins is often the best analogy though. Not to dismerit other kinds of thinkings, but already dismeriting them, that's what this is all about. When in doubt, showing the code is often what works.