> Companies with relatively young, high-quality codebases benefit the most from generative AI tools, while companies with gnarly, legacy codebases will struggle to adopt them. In other words, the penalty for having a ‘high-debt’ codebase is now larger than ever.
This mirrors my experience using LLMs on personal projects. They can provide good advice only to the extent that your project stays within the bounds of well-known patterns. As soon as your codebase gets a little bit "weird" (ie trying to do anything novel and interesting), the model chokes, starts hallucinating, and makes your job considerably harder.
Put another way, LLMs make the easy stuff easier, but royally screws up the hard stuff. The gap does appear to be widening, not shrinking. They work best where we need them the least.
> However, in ‘high-debt’ environments with subtle control flow, long-range dependencies, and unexpected patterns, they struggle to generate a useful response
I'd argue that a lot of this is not "tech debt" but just signs of maturity in a codebase. Real world business requirements don't often map cleanly onto any given pattern. Over time codebases develop these "scars", little patches of weirdness. It's often tempting for the younger, less experienced engineer to declare this as tech debt or cruft or whatever, and that a full re-write is needed. Only to re-learn the lessons those scars taught in the first place.
>Instead of trying to force genAI tools to tackle thorny issues in legacy codebases, human experts should do the work of refactoring legacy code until genAI can operate on it smoothly
Instead of genAI doing the rubbish, boring, low status part of the job, you should do the bits of the job no one will reward you for, and then watch as your boss waxes lyrical about how genAI is amazing once you've done all the hard work for it?
It just feels like if you're re-directing your efforts to help the AI, because the AI isn't very good at actual complex coding tasks then... what's the benefit of AI in the first place? It's nice that it helps you with the easy bit, but the easy bit shouldn't be that much of your actual work and at the end of the day... it's easy?
This gives very similar vibes to: "I wanted machines to do all the soul crushing monotonous jobs so we would be free to go and paint and write books and fulfill our creative passions but instead we've created a machine to trivially create any art work but can't work a till"
> There is an emerging belief that AI will make tech debt less relevant.
Wow. It's hard to believe that people are earnestly supposing this. From everything we have evidence of so far, AI generated code is destined to be a prolific font of tech debt. It's irregular, inconsistent, highly sensitive to specific prompting and context inputs, and generally produces "make do" code at best. It can be extremely "cheap" vs traditional contributions, but gets to where it's going by the shortest path rather than the most forward-looking or comprehensive.
And so it does indeed work best with young projects where the prevailing tech debt load remains low enough that the project can absorb large additions of new debt and incoherence, but that's not to the advantage of young projects. It's setting those projects up to be young and debt-swamped much sooner than they would otherwise be.
If mature projects can't use generative AI as extensively, that's going to be to their advantage, not their detriment -- at least in terms of tech debt. They'll be forced to continue plodding along at their lumbering pace while competitors bloom and burst in cycles of rapid initial development followed by premature seizure/collapse.
And to be clear: AI generated code can have real value, but the framing of this article is bonkers.
It is a self-reinforcing pattern: the easier it is to generate code, the more code is generated. The more code is generated, the bigger the cost of maintenance is (and the relationship is super-linear).
So every time we generate the same boilerplate we really do copy/paste adding to maintenance costs.
We are amazed looking at the code generation capabilities of LLMs forgetting the goal is to have less code - not more.
I love the way our SWE jobs are evolving. AI eating the simple stuff, generating more code but with harder to detect bugs... I'm serious, it feels that we can move faster with these tools but perhaps have to operate differently.
We are a long ways from automating our jobs away, instead our expertise evolves.
I suspect doctors go through a similar evolution as surgical methods are updated.
I would love to read or participate in the discussion of how to be strategic in this new world. Specifically, how to best utilize code generating tools as a SWE. I suppose I can wait a couple of years for new school SWEs to teach me, unless anyone is aware of content on this?
This is just taking the advice to make code sane so that humans could undertand and modify it, and then justifying it as "AI should be able to understand and modify it". I mean, the same developer efficiency improvements apply to both humans and AI. The only difference is that currently humans working in a space eventually learn the gotchas, while current AIs don't really have that ability to learn the nuances of a particular space over time.
On one hand I agree with this conceptually, but on the other hand I've also been able to use AI to rapidly clean up and better structure a bunch of my existing code.
The blind copy-paste has generally been a bad idea though. Still need to read the code spit out, ask for explanations, do some iterating.
> Not only does a complex codebase make it harder for the model to generate a coherent response, it also makes it harder for the developer to formulate a coherent request.
> This experience has lead most developers to “watch and wait” for the tools to improve until they can handle ‘production-level’ complexity in software.
You will be waiting until the heat death of the universe.
If you are unable to articulate the exact nature of your problem, it won't ever matter how powerful the model is. Even a nuclear weapon will fail to have effect on target if you can't approximate its location.
Ideas like dumpstering all of the codebase into a gigantic context window seem insufficient, since the reason you are involved in the first place is because that heap is not doing what the customer wants it to do. It is currently a representation of where you don't want to be.
Haven't read the article, don't need to read the article: this is so, SO, so painfully obvious! If someone needs this spelled out for them they shouldn't be making technical decisions of any kind. Sad that this needs to be said.
> In essence, the goal should be to unblock your AI tools as much as possible. One reliable way to do this is to spend time breaking your system down into cohesive and coherent modules, each interacting through an explicit interface.
I find this works because its much easier to debug a subtle GPT bug in a well validated interface than the same bug buried in a nested for loop somewhere.
"Companies with relatively young, high-quality codebases"
I thought that at the beginning the code might be a bit messy because there is the need to iterate fast and quality comes with time, what's the experience of the crowd on this?
It's funny that his recommendations - organize code in modules etc. - are nothing AI-specific, it's what you'd do if you had to handover your project to an external team, or simply make it maintainable in the long term. So the best strategy for collaborating with AI turns out to be the same as for collaborating with humans.
I completely agree. That's why my stance is to wait and see, and in the meanwhile get our shit together, as in make our code maintainable by any intelligent being, human or not.
It is not just the code produced with code generation tools but also business logic using gen AI.
For example a RAG pipeline. People are rushing things to market that are not built to last. The likes of LangChain etc. offer little software engineering polishing. I wish there were a more mature enterprise framework. Spring AI is still in the making and Go is lagging behind.
While this primarily focuses on the software development side of things, I’d like to chime in that this applies to the IT side of the equation as well.
LLMs can’t understand why your firewall rules have strange forwards for ancient enterprise systems, nor can they “automate” Operations on legacy systems or custom implementations. The only way to fix those issues is to throw money and political will behind addressing technical debt in a permanent sense, which no organization seemingly wants to do.
These things aren’t silver bullets, and throwing more technology at an inherently political problem (tech debt) won’t ever solve it.
I find AI most helpful with very specific, narrow commands (add a new variable to the logger, which means typescript and a bunch of other things need to be updated) and it can go off and do that. While it's doing that I'll be thinking about the next thing to be fixed already.
Asking it for higher level planning / architecture is just asking for pain
Today's job is finishing up and testing some rather gnarly haproxy configuration.
There's already a fairly high chance I'm going to stuff something up with it. There is no chance that I'm giving some other entity that chance as well.
Speaking personally, I've found this tech much more helpful in existing codebases than new ones.
Missing test? Great, I'll get help identifying what the code should be doing, then use AI to write a boatload of tests in service towards those goals. Then I'll use it to help refactor some of the code.
But unlike the article, this requires actively engaging with the tool rather than, as they say a "sit and wait" (i.e., lazy) approach to developing.
> Companies with relatively young, high-quality codebases benefit the most from generative AI tools, while companies with gnarly, legacy codebases will struggle to adopt them.
So you say, but {citation needed}. Stuff like this is simply not known yet.
AI can easily be applied in legacy codebases, like to help with time-consuming refactoring.
The title of this article made me think that paying down traditional tech debt due to bugs or whatever is straightforward. Software with tech debt and/or bugs that incorporates AI isn’t a straightforward rewrite, but takes ML skills to pay down.
> Instead of trying to force genAI tools to tackle thorny issues in legacy codebases, human experts should do the work of refactoring legacy code until genAI can operate on it smoothly. When direct refactoring is still too risky, teams can adjust their development strategy with approaches like strangler fig to build greenfield modules which can benefit immediately from genAI tooling.
Or, y'know, just not bother with any of this bullshit. "We must rewrite everything so that CoPilot will sometimes give correct answers!" I mean, is this worth the effort? Why? This seems bonkers, on the face of it.
This type of analysis is a mirror of the early days of chess "AI". All kinds of commentary explaining the weaknesses of the engines, and extolling the impossible-to-reproduce capabilities of human players. But while they may have been correct in the moment, they didn't really appreciate the march toward utter dominance and supremacy of the machines over human players.
While there is no guarantee that the same trajectory is true for programming, we need to heed how emotionally attached we can be to denying the possibility.
Yeah, this is a total click-bait article. The claim put forth by the title is not at all supported by the article contents, which basically states "old codebases riddled with tech-debt do not benefit very much from GenAI, while newer cleaner codebases will see more benefit." That is so completely far off from "AI will make your tech debt worse."
The author starts with a straw man argument, of someone who thinks that AI is great at dealing with technical debt. He makes little attempt to steel man their argument. Then the author argues the opposite without much supporting evidence. I think the author is right that some people were quick to assume that AI is much better for brownfield projects, but I think the author was also quick to assume the opposite.
I agree with a lot of the assertions made in TFA but not so much the conclusion. AI increasing the velocity of simpler code doesn’t make tech debt more expensive, it just means it won’t benefit as much / be made cheaper.
OTOH if devs are getting the simpler stuff done faster maybe they have more time to work on debt.
I asked the AI to write me some code to get a list of all the objects in an S3 bucket. It returned some code that worked, it would no doubt be approved by most developers. But on further inspection I noticed that it would cause a bug if the bucket had more than 1000 objects because S3 only delivers 1000 max objects per request, and the API is paged, and the AI had no ability to understand this. So the AI's code would be buggy should the bucket contain more than 1000 objects, which is really, really easy to do with an S3 bucket.
... until it won't. A mature code-base also has (or should have) strong test coverage, both in unit-testing and comprehensive integration testing. With proper ci/cd pipelines, you can have a small team update and upgrade stuff at a fraction of the usual cost (see amazon going from old java to newer versions) and "pay off" some of that debt.
LLM code gen tools are really freaking good...at making the exact same react boilerplate app that everyone else has.
The moment you need to do something novel or complicated they choke up.
This is why I'm not very confident that tools like Vercel's v0 (https://v0.dev/) are useful for more than just playing around. It seems very impressive at first glance - but it's a mile wide and only an inch deep.
AI makes tech debt more expensive
(gauge.sh)346 points by 0x63_Problems 10 hours ago | 182 comments
Comments
This mirrors my experience using LLMs on personal projects. They can provide good advice only to the extent that your project stays within the bounds of well-known patterns. As soon as your codebase gets a little bit "weird" (ie trying to do anything novel and interesting), the model chokes, starts hallucinating, and makes your job considerably harder.
Put another way, LLMs make the easy stuff easier, but royally screws up the hard stuff. The gap does appear to be widening, not shrinking. They work best where we need them the least.
I'd argue that a lot of this is not "tech debt" but just signs of maturity in a codebase. Real world business requirements don't often map cleanly onto any given pattern. Over time codebases develop these "scars", little patches of weirdness. It's often tempting for the younger, less experienced engineer to declare this as tech debt or cruft or whatever, and that a full re-write is needed. Only to re-learn the lessons those scars taught in the first place.
Instead of genAI doing the rubbish, boring, low status part of the job, you should do the bits of the job no one will reward you for, and then watch as your boss waxes lyrical about how genAI is amazing once you've done all the hard work for it?
It just feels like if you're re-directing your efforts to help the AI, because the AI isn't very good at actual complex coding tasks then... what's the benefit of AI in the first place? It's nice that it helps you with the easy bit, but the easy bit shouldn't be that much of your actual work and at the end of the day... it's easy?
This gives very similar vibes to: "I wanted machines to do all the soul crushing monotonous jobs so we would be free to go and paint and write books and fulfill our creative passions but instead we've created a machine to trivially create any art work but can't work a till"
Wow. It's hard to believe that people are earnestly supposing this. From everything we have evidence of so far, AI generated code is destined to be a prolific font of tech debt. It's irregular, inconsistent, highly sensitive to specific prompting and context inputs, and generally produces "make do" code at best. It can be extremely "cheap" vs traditional contributions, but gets to where it's going by the shortest path rather than the most forward-looking or comprehensive.
And so it does indeed work best with young projects where the prevailing tech debt load remains low enough that the project can absorb large additions of new debt and incoherence, but that's not to the advantage of young projects. It's setting those projects up to be young and debt-swamped much sooner than they would otherwise be.
If mature projects can't use generative AI as extensively, that's going to be to their advantage, not their detriment -- at least in terms of tech debt. They'll be forced to continue plodding along at their lumbering pace while competitors bloom and burst in cycles of rapid initial development followed by premature seizure/collapse.
And to be clear: AI generated code can have real value, but the framing of this article is bonkers.
Machine learning is the high interest credit card of technical debt.
So every time we generate the same boilerplate we really do copy/paste adding to maintenance costs.
We are amazed looking at the code generation capabilities of LLMs forgetting the goal is to have less code - not more.
We are a long ways from automating our jobs away, instead our expertise evolves.
I suspect doctors go through a similar evolution as surgical methods are updated.
I would love to read or participate in the discussion of how to be strategic in this new world. Specifically, how to best utilize code generating tools as a SWE. I suppose I can wait a couple of years for new school SWEs to teach me, unless anyone is aware of content on this?
The blind copy-paste has generally been a bad idea though. Still need to read the code spit out, ask for explanations, do some iterating.
> This experience has lead most developers to “watch and wait” for the tools to improve until they can handle ‘production-level’ complexity in software.
You will be waiting until the heat death of the universe.
If you are unable to articulate the exact nature of your problem, it won't ever matter how powerful the model is. Even a nuclear weapon will fail to have effect on target if you can't approximate its location.
Ideas like dumpstering all of the codebase into a gigantic context window seem insufficient, since the reason you are involved in the first place is because that heap is not doing what the customer wants it to do. It is currently a representation of where you don't want to be.
I find this works because its much easier to debug a subtle GPT bug in a well validated interface than the same bug buried in a nested for loop somewhere.
I thought that at the beginning the code might be a bit messy because there is the need to iterate fast and quality comes with time, what's the experience of the crowd on this?
I completely agree. That's why my stance is to wait and see, and in the meanwhile get our shit together, as in make our code maintainable by any intelligent being, human or not.
For example a RAG pipeline. People are rushing things to market that are not built to last. The likes of LangChain etc. offer little software engineering polishing. I wish there were a more mature enterprise framework. Spring AI is still in the making and Go is lagging behind.
Because with AI you can turn any problem into a black box. You build a model, and call it "solved". But then reality hits ...
LLMs can’t understand why your firewall rules have strange forwards for ancient enterprise systems, nor can they “automate” Operations on legacy systems or custom implementations. The only way to fix those issues is to throw money and political will behind addressing technical debt in a permanent sense, which no organization seemingly wants to do.
These things aren’t silver bullets, and throwing more technology at an inherently political problem (tech debt) won’t ever solve it.
Asking it for higher level planning / architecture is just asking for pain
This is for tiny code snippets, hello-world size, stringing together some primitives to render relatively simple objects.
Turns out, if the codebase / framework is a bit obscure and poorly documented, even the genie can't help.
Missing test? Great, I'll get help identifying what the code should be doing, then use AI to write a boatload of tests in service towards those goals. Then I'll use it to help refactor some of the code.
But unlike the article, this requires actively engaging with the tool rather than, as they say a "sit and wait" (i.e., lazy) approach to developing.
So you say, but {citation needed}. Stuff like this is simply not known yet.
AI can easily be applied in legacy codebases, like to help with time-consuming refactoring.
Or, y'know, just not bother with any of this bullshit. "We must rewrite everything so that CoPilot will sometimes give correct answers!" I mean, is this worth the effort? Why? This seems bonkers, on the face of it.
This isn't AI doing.
It's the doing of adding any new feature to a product with existing tech debt.
And since AI for most companies is a feature, like any feature, it only makes the tech debt worse.
While there is no guarantee that the same trajectory is true for programming, we need to heed how emotionally attached we can be to denying the possibility.
"GARBAGE IN -- GARBAGE OUT!!"
Creating react pages is the new COBOL
OTOH if devs are getting the simpler stuff done faster maybe they have more time to work on debt.
The tooling for this will only improve.
How does one determine if that's even possible, much less estimate the work involved to get there?
After all, 'subtle control flow, long-range dependencies, and unexpected patterns' do not always indicate tech-debt.
I'm sure nothing will change in the future either.
The moment you need to do something novel or complicated they choke up.
This is why I'm not very confident that tools like Vercel's v0 (https://v0.dev/) are useful for more than just playing around. It seems very impressive at first glance - but it's a mile wide and only an inch deep.