These claims wouldn't matter if the topic weren't so deadly serious. Tech leaders everywhere are buying into the FOMO, convinced their competitors are getting massive gains they're missing out on. This drives them to rebrand as AI-First companies, justify layoffs with newfound productivity narratives, and lowball developer salaries under the assumption that AI has fundamentally changed the value equation.
This is my biggest problem right now. The types of problems I'm trying to solve at work require careful planning and execution, and AI has not been helpful for it in the slightest. My manager told me that the time to deliver my latest project was cut to 20% of the original estimate because we are "an AI-first company". The mass hysteria among SVPs and PMs is absolutely insane right now, I've never seen anything like it.
1. LLMs do not increase general developer productivity by 10x across the board for general purpose tasks selected at random.
2. LLMs dramatically increases productivity for a limited subset of tasks
3. LLMs can be automated to do busy work and although they may take longer in terms of clock time than a human, the work is effectively done in the background.
LLMs can get me up to speed on new APIs and libraries far faster than I can myself, a gigantic speedup. If I need to write a small bit of glue code in a language I do not know, LLMs not only save me time, but they make it so I don't have to learn something that I'll likely never use again.
Fixing up existing large code bases? Productivity is at best a wash.
Setting up a scaffolding for a new website? LLMs are amazing at it.
Writing mocks for classes? LLMs know the details of using mock libraries really well and can get it done far faster than I can, especially since writing complex mocks is something I do a couple times a year and completely forget how to do in-between the rare times I am doing it.
Navigating a new code base? LLMs are ~70% great at this. If you've ever opened up an over-engineered WTF project, just finding where HTTP routes are defined at can be a problem. "Yo, Claude, where are the route endpoints in this project defined at? Where do the dependency injected functions for auth live?"
Right tool, right job. Stop using a hammer on nails.
Most of it doesn't exist beyond videos of code spraying onto a screen alongside a claim that "juniors are dead."
I think the "why" for this is that the stakes are high. The economy is trembling. Tech jobs are evaporating. There's a high anxiety around AI being a savior, and so, a demi-religion is forming among the crowd that needs AI to be able to replace developers/competency.
That said: I personally have gotten impressive results with AI, but you still need to know what you're doing. Most people don't (beyond the beginner -> intermediate range), and so, it's no surprise that they're flooding social media with exaggerated claims.
If you didn't have a superpower before AI (writing code), then having that superpower as a perceived equalizer is something that you will deploy all resources (material, psychological, etc) to ensuring that everyone else maintain the position that 1) superpower good, 2) superpower cannot go away 3) the superpower being fallible should be ignored.
Like any other hype cycle, these people will flush out, the midpoint will be discovered, and we'll patiently await the next excuse to incinerate billions of dollars.
This tracks with my own experience as well. I’ve found it useful in some trivial ways (eg: small refactors, type definition from a schema, etc.) but so far tasks more than that it misses things and requires rework, etc. The future may make me eat my words though.
On the other hand, I’ve lately seen it misused by less experienced engineers trying to implement bigger features who eagerly accept all it churns out as “good” without realizing the code it produced:
- doesn’t follow our existing style guide and patterns.
- implements some logic from scratch where there certainly is more than one suitable library, making this code we now own.
- is some behemoth of a PR trying to do all the things.
I completely agree with the thesis here. I also have not seen a massive productivity boost with the use of AI.
I think that there will be neurological fatigue occurring whereby if software engineers are not actively practicing problem-solving, discernment, and translation into computer code - those skills will atrophy...
Yee, AI is not the 2x or 10x technology of the future ™ is was promised to be. It may the case that any productivity boost is happening within existing private code bases. Even still, there should be a modest uptick in noticeably improved offer deployment in the market, which does not appear to be there.
In my consulting practice I am seeing this phenomenon regularly, wereby new founders or stir crazy CTOs push the use of AI and ultimately find that they're spending more time wrangling a spastic code base than they are building shared understanding and working together.
I have recently taken on advisory roles and retainers just to reinstill engineering best practices..
This makes some sense. We have CEOs saying they're not hiring developers because AI makes their existing ones 10X more productive. If that productivity enhancement was real, wouldn't they be trying to hire all the developers? If you're getting 10X the productivity for the same investment, wouldn't you pour cash into that engine like crazy?
Perhaps these graphs show that management is indeed so finely tuned that they've managed to apply the AI revolution to keep productivity exactly flat while reducing expenses.
Great angle to look at the releases of new software. I, too, thought we'd see a huge increase by now.
An alternative theory is that writing code was never the bottleneck of releasing software. The exploration of what it is you're building and getting it on a platform takes time and effort.
On the other hand, yeah, it's really easy to 'hold it wrong' with AI tools. Sometimes I have a great day and think I've figured it out. And then the next day, I realize that I'm still holding it wrong in some other way.
It is philosophically interesting that it is so hard to understand what makes building software products hard. And how to make it more productive. I can build software for 20 years and still feel like I don't really know.
The answer is that we're making it right now. AI didn't speed me up at all until agents got good enough, which was April/May of this year.
Just today I built a shovelware CLI that exports iMessage archives into a standalone website export. Would have taken me weeks. I'll probably have it out as a homebrew formula in a day or two.
I'm working on an iOS app as well that's MUCH further along than it would be if I hand-rolled it, but I'm intentionally taking my time with it.
Anyway, the post's data mostly ends in March/April which is when generative AI started being useful for coding at all (and I've had Copilot enabled since Nov 2022)
This article reminds me of two recent observations by Paul Krugman about the internet:
"So, here’s labor productivity growth over the 25 years following each date on the horizontal axis [...] See the great productivity boom that followed the rise of the internet? Neither do I. [...] Maybe the key point is that nobody is arguing that the internet has been useless; surely, it has contributed to economic growth. The argument instead is that its benefits weren’t exceptionally large compared with those of earlier, less glamorous technologies."¹
"On the second, history suggests that large economic effects from A.I. will take longer to materialize than many people currently seem to expect [...] And even while it lasted, productivity growth during the I.T. boom was no higher than it was during the generation-long boom after World War II, which was notable in the fact that it didn’t seem to be driven by any radically new technology [...] That’s not to say that artificial intelligence won’t have huge economic impacts. But history suggests that they won’t come quickly. ChatGPT and whatever follows are probably an economic story for the 2030s, not for the next few years."²
Specific example: Actually used a leet-code style algorithms implementation of memo-ization for branching. This would have taken a couple of days to implement by hand, but it took about 20 minutes to write the spec and 20 minutes to review solutions and merge the solution generated. If you're curious you can see this diff generated here: https://github.com/sutt/innocuous/commit/cdabc98
On one hand I don't understand what all the fuss is about.
LLMs are great at all kinds of things around and about: searching for (good) information, summarizing existing text, conceptual discussions where it points you in the right directions very quickly, etc. ..... they are just not great (some might say harmful) at straight up non-trivial code generation or design of complex systems with the added peculiarity that on the surface the models seem almost capable to do it but never quite ... which is sort their central feature: producing text so that it is seems correct from statistical perspective, but without actual reasoning.
On the other hand, I do understand that the things the LLMs are really great at is not actually all that spectacular to monetize ... and so as a result we have all these snake oil salesmen on every corner boasting about nonsensical vibecoding achievements, because that's where the real money would be ... if it were really true ... but it is not.
In case the author is reading this, I have the receipts on how there's a real step function in how much software I build, especially lately. I am not going to put any number on it because that makes no sense, but I certainly push a lot of code that reasonably seems to work.
The reason it doesn't show up online is that I mostly write software for myself and for work, with the primary goal of making things better, not faster. More tooling, better infra, better logging, more prototyping, more experimentation, more exploration.
Here's my opensource work: https://github.com/orgs/go-go-golems/repositories . These are not just one-offs (although there's plenty of those in the vibes/ and go-go-labs/ repositories), but long-lived codebases / frameworks that are building upon each other and have gone through many many iterations.
I need to agree with the author, with a caveat. He is a well developed developer. For somebody like him, churning out good quality code is probably easy.
Where i expect to see a lot of those metrics of feeling fast come from, is from people who may have less coding experience, and with AI are coding way above their level.
My brother in law asks for a nice product website, i just feed his business plan into a LLM, do some fine tuning on the results, and have a good looking website in a hour time. If i did it myself manually, just take me behind a barn as those jobs are so boring and take for ages. But i know that website design is a weakness of mine.
That is the power of LLMs. Turn out quick code, maybe offer some suggestion you did not think about, but ... it also eats time! Making your prompts so that the LLM understands, waiting for the result, ... waiting ... ok, now check the result, can you use it? O no, it did X, Y, Z wrong. Prompt again ... and again. And this is where your productivity goes to die.
So when you compare a pool of developer feedback, your going to get a broad "it helps a lot", "some", "is worse then my code", ... mix in with the prompting, result delays etc...
It gets even worse with Agent / Vibe coding, as you just tend to be waiting, 5, 10min for changed to be done. You need to review them, test them, ... o no, the LLM screwed something up again. O no, it removed 50% of my code. Hey, where did my comments go. And we are back to a loss of time.
LLMs are a tool... But after a lot of working with them, my opinion is to use them when needed but do not depend on them for everything. I sometimes look with cow eyes when people say they are coding so much with LLMs and spending 200, or more bucks per month.
They can be powerful tools, but i feel that some folks become so over dependent on them. And worst is my feeling that our juniors are going to be in a world of hurt, if their skills are more LLM monkey coding (or vibe coding), then actually understanding how to code (and the knowledge behind the actual programming languages and systems).
I'm not sure what to make of these takes because so many people are using such an enormous variety of LLM tooling in such a variety of ways, people are going to get a variety of results.
Let's take the following scenario for the sake of argument: a codebase with well-defined AGENTS.md, referencing good architecture, roadmap, and product documentation, and with good test coverage, much of which was written by an LLM and lightly reviewed and edited by a human. Let's say for the sake of argument that the human is not enjoying 10x productivity despite all this scaffolding.
Is it still worthwhile to use LLM tooling? You know what, I think a lot of companies would say yes. There are way too many companies whose codebases lack testing and documentation, that are too difficult to on-board new engineers and have too high risk if the original engineers are lost. The simple fact that LLMs, to be effective, force the adaptation of proper testing and documentation is a huge win for corporate software.
I used to be a full-time developer back in the day. Then I was a manager. Then I was a CTO. I stopped doing the day-to-day development and even stopped micro-managing the detailed design.
When I tried to code again, I found I didn't really have the patience for it -- having to learn new frameworks, APIs, languages, tricky little details, I used to find it engrossing: it had become annoying.
But with tools like Claude Code and my knowledge about how software should be designed and how things should work, I am able to develop big systems again.
I'm not 20% more productive than I was. I'm not 10x more productive than I was either. I'm infinity times more productive because I wouldn't be doing it at all otherwise, realistically: I'd either hire someone to do it, or not do it, if it wasn't important enough to go through the trouble to hire someone.
Sure, if you are a great developer and spend all day coding and love it, these tools may just be a hindrance. But if you otherwise wouldn't do it at all they are the opposite of that.
The problem with current GenAI is the same as in outsourcing to lowest bidder in India or whatever. For any non-trivial project you'll get something that may appear to work out of it, but for anything production-ready you'll most likely you'll spend lots of time testing, verifying, cleaning up the code and making changes to things AI didn't catch. Then there's requirement gathering, discussing with stakeholders, gathering more feedback and so on, debugging when things fail in production...
I believe it's a productivity boost, but only to a small part of my job. The boost would be larger if only had to build proof-of-concepts or hobby projects that don't need to be reliable in prod, and don't require feedback and requirements from many other people.
This reminds me of something... I'm a jazz musician when not being a coder, and have studied and taught from/to a lot of players. One thing advanced improvisors notice is that the student is very frequently not a good judge – in the moment – of what is making them better. Doing long term analytics tests (as the author did) works, but knowing how well something is working while you're doing it? not so much. Very, very frequently that which feels productive isn't, and that which feels painful and slow is.
Just spit balling here, but it sure feels similar.
There is actually a lot of AI shovelware on Steam. Sort by newest releases and you'll see stuff like a developer releasing 10 puzzle games in one day.
I have the same experience as OP, I use AI every day including coding agents, I like it, it's useful. But it's not transformative to my core work.
I think this comes down to the type of work you're doing. I think the issue is that most software engineering isn't in fields amenable to shovelware.
Most of us either work in areas where the coding is intensely brownfield. AI is great but not doubling anyone's productivity. Or, in areas where the productivity bottlenecks are nowhere near the code.
Shovelware may not be a good way to track additional productivity.
That said, I’m skeptical that AI is as helpful for commercial software. It’s been great for in automating my workflow because I suck at shell scripting and AI is great at it. But most of the code I write I honestly halfway don’t know what I’m going to write until I write it. The prompt itself is where my thinking goes - so the time savings would be fairly small, but I also think I’m fairly skilled (except at scripting).
I think the explanation is simple: there is a direct correlation between being too lazy and demotivated to write your own code, and being too lazy and demotivated to actually finish a project and publish your work online.
The same people who are willing to go through all the steps to release an application online are also willing to go through the extra effort of writing their own code. The code is actually the easy part compared to the rest of it... always has been.
As an analogy, can you imagine being a startup that hired a developer, and months later finding out the bulk of the new Web app they "coded" for you was actually copy&pasted open source code, loosely obfuscated, which they were passing it off as something they developed, and to which the company had IP rights?
You'd immediately convene the cofounders and a lawyer, about how to make this have never happened.
First you realize that you need to hand the lawyer the evidence (against the employee), and otherwise remove all traces of that code and activity from the company.
Simultaneously, you need to get real developers started rushing to rewrite everything without obvious IP taint.
Then one of you will delicately ask whether firing and legal action against the employee is sufficient, or whether the employee needs to sleep with the fishes to keep them quiet.
The lawyer will say this kind of situation isn't within the scope of their practice, but here's the number of a person they refer to only as 'the specialist'.
Soon, not only are you losing the startup, and the LLC is being pierced to go after your personal assets, but you're also personally going to prison. Because you were also too cheap to pay the professional fee for 'the specialist', and you asked ChatGPT to make the employee have a freak industrial shredder accident.
All this because you tried to cheap out, and spend $20 or $200 on getting some kind of code to appear in your repo, while pretending you didn't know where it came from.
I agree quite strongly with this article. I've used AI for some thing,s but when it comes to productivity I don't use it in big codebases I contribute to or code which I want to put into production. I've mainly only used it to build little concept demos/prototypes, and even then I build on top of a framework I wrote by hand like last year or so. And I only use AI to get familiar enough with the general patterns for a library I'm not familiar with (mainly because I'd like to avoid diving into tests to learn how the library works). But even then, I always have the docs open, and API docs, and I very carefully review and thoroughly test on my own system and with what I'm really trying to do before I even consider it something I'd give to others. Even so, I wouldn't say I've gotten a productivity increase, because (1) I don't measure or really care about productivity with these kinds of things, and (2) I'm the one who already knows what I want to accomplish, and just need a bit of help trying to work towards that goal.
I generally agree with the sentiment of the article, but the OP should also be looking at product launch websites like ProductHunt, where there are tens to hundreds of vibe coded SaaS apps listed daily.
From my experience, it's much easier to get an LLM to generate code for a React/Tailwind CSS web app than a mobile app, and that's why we're seeing so many of these apps showing up in the SaaS space.
LLM-powered shovelware sits in the same box as coke-induced business ideas. Both give you the dopamine rush of being “on top of it” until the magic wears off and you’re scrubbing your apartment floor with a toothbrush at 4 AM, or stuck debugging a DB migration that Claude Code has been mangling for five hours straight.
Changing domain to writing and images and video you can see LinkedIn is awash with everyone generating everything by LLMs. The posting cadence has quickened too as people shout louder to raise their AI assisted voice over other people’s.
We’ve all seen and heard the AI images and video tsunami
So why not software (yet but soon)??
Firstly, Software often has a function and AI tool creations cannot make that work. Lovable/Bolt etc are too flakey to live up to their text to app promises. A shedload of horror debugging or a lottery win of luck is required to fashion an app out of that. This will improve over time but the big question is, by enough?
And secondly, like on LinkedIn above: perhaps the standards of the users will drop? LinkedIn readers now tolerate the llm posts, it is not a mark of shame. Will the same reduction in standards in software users open the door to good-enough shovelware?
You're missing the forest for the trees. It speeds up people who don't know how to program 100%. We could see a flourishing of ideas and programs coming out of 'regular' people. The kind of people that approach programmers with the 'I have an idea' and get ignored. Maybe the programs will be basic, but they'll be a template for something better, which then a programmer might say 'I see the value in that idea' and help develop it.
It'll increase incremental developments manyfold. A non-programmer spending a few hours on AI to make their workflow better and easier and faster. This is what everyone here keeps missing. It's not the programmers that should be using AI; it's 'regular' people.
Its really interesting to bring graphs of 'new ios releases per month' or 'total domain name registrations' into the argument - thats a good way of keeping the argument tied to the real world
There's a relatively monotonous task in software engineering that pretty much everyone working no a legacy c/c++ code base has had to face: static analysis and compiler warnings. That seems about as boring and routine of an operation that exists. As simple as can be. I've seen this task farmed out to interns paid barely anything just to get it done.
My question to HN is... can LLMs do this? Can they convert all the unsafe c-string invocations to safe. Can they replace system calls with posix calls. Can they wrap everything in a smart pointer and make sure that mutex locks are added where needed.
While I agree with the points he’s raising let me play devils advocate.
There’s a lot more code being written now that’s not counted in these statistics. A friend of mine vibe coded a writing tool for himself entirely using Gemini canvas.
I regularly vibe code little analyses or scripts in ChatGPT which would have required writing code earlier.
None of these are counted in these statistics.
And yes AI isn’t quite good enough to super charge app creation end to end. Claude has only been good for a few months. That’s hardly enough time for adoption !
This would be like analysing the impact of languages like Perl or Python on software 3 months after their release.
Good article, gave me some points I hadn't considered before. I know there are some AI generated games out there, but maybe the same people were using asset flips before?
I'd also be curious how the numbers look for AI generated videos/images, because social media and youtube seem absolutely flooded with the stuff. Maybe it's because the output doesn't have to "function" like code does?
Grammatical nit: The phrase is "neck and neck", like where two race horses are very close in progress
Turns out AI can’t help script kiddies write production ready applications. Also turns out that AI is good for some things and not others, and a coin toss isn’t a good method to decide which tasks to do using AI. I read that JavaScript is by far the most popular language: still not using it for the mission critical software I write. So it doesn’t bother me that 90% of HN is “AI sucks!” stories. I find it extremely effective when used appropriately. YMMV.
I haven't found ChatGPT helpful in speeding up my coding because I don't want to give up understanding the code. If I let ChatGPT do it, then there are inevitable mistakes, and it sometimes hallucinates libraries, etc. I have found it very useful in guiding me through the dev-ops of working with and configuring AWS instances for a blog server, for a git server, etc. As a small business owner, that has been a big time saver.
I get excellent productivity gains from AI. Not everywhere, and not linearly. It makes the bad stuff about the work (boilerplate, dealing with things outside my specialties) tolerable and the good stuff a bit better. It makes me want to create more. Business guys missing some visualization? Hell why not, few minutes on Aider and it's there. Let's improve our test suites. And let's migrate away from that legacy framework or runtime!
But my workflow is anything but "let her rip". It's very calculated, orderly, just like mastering any other tool. I'm always in the loop. I can't imagine someone without serious experience getting good stuff, and when things go bad, oh boy you're bringing a firehose of crap into your org.
I have a junior programmer who's a bright kid but lacking a lot of depth. Got him a Cursor subscription, tracking his code closely via PRs and calling out the BS but we're getting serious work done.
I just can't see how this new situation calls for less programmers. It will just bring about more software, surely more capable software after everyone adjusts.
How widely is AI adopted in the wider IT industry anyways? I imagine $200 per month subscription isn't that popular with people refusing to pay for their IDEs and going with free alternatives instead. And month worth of free tier of AI agent can be spent in two intense evenings.
So who pays for AIs for developers? Mostly corpos. And the speed of individual developer was never a limiting factor in corpos. Average corporate development was always 10 times slower than indie. So even doubling it won't make any impression.
I don't know if I'm faster with AI at a specific task, but I know that I'm doing things I wouldn't touch because I hate the tedium. And I'm doing them while cooking and eating dinner and thinking about wider context and next things to come. So for me it feels worth it.
I think it might be something like with cars and safety. Any car safety improvements are going to be offset by the drivers driving faster and more recklessly. So maybe any speed improvements that AI might make for the project is nullified by developers doing things they would just skip without it.
This is the part that really grinds my gears. Careless People.
> The impact on human lives is incredible. People are being fired because they’re not adopting these tools fast enough. People are sitting in jobs they don’t like because they’re afraid if they go somewhere else it’ll be worse. People are spending all this time trying to get good at prompting and feeling bad because they’re failing.
The amount of shovelware is not a reliable signal. You know what's almost empty for the first time in almost a decade? My backlog. Where AI tools shine is taking an existing codebase and instructions, and going to town. It's not dreaming up whole games from scratch. All the engineers out there didn't quit their jobs to build new stuff, they picked up new tools to do their existing jobs better (or at least, to hate their jobs less).
The shovelware was always there. And it always will be. But that's doesn't mean it's splurting out faster, because that's not what AI does. Hell, if anything I expect that there's less visible shovelware because when it does get created, it's less obvious (and perhaps higher quality).
At some point, the quality of uninspired projects will be lifted up by the baseline of quality that mainstream AI allows. At what point is that "high enough that we can't tell what's garbage"? We've perhaps found ourselves at or around that point.
The data is surprising. However, I do wish this article looked carefully into barriers of entry as it can explain the lack of increases in your data.
For example, in Steam, it costs $100 to release a game. You may extend your game with what's called a DLC and that costs $0 to release. If I were to build shovelware with especially with AI-generated content, I'd more keen to make a single game with a bunch of DLC.
For game development, integration of AI into engines is another barrier. There aren't that many choices of engines that gives AI an interface to work with. The obvious interface is games that can be entirely build with code (e.g., pygame; even Godot is a big stretch)
I haven't really found a major productivity boost using LLMs for _production_ software. Writing up the prompt and iterating can take as much time as just doing it. The auto-complete is better _IF_ it gets the syntax correct (depends a lot on how well it knows or can infer the framework).
Where I have found them very useful are for one-off scripts and stuff I need done quick and dirty, that isn't too complex and easily verifiable (so I can catch the mistakes it makes, and it does make them!), and especially in languages I don't know that well or don't like (i.e., bash, powershell, javascript)
While I agree generally with the premise that the silver bullet that AI coding has been marketed to be has underdelivered (even if it doesn't feel that way), I gotta point out that the experiment and its results don't do a good job of capturing that. One of the biggest parts of using these AI tools is knowing which tasks they're most suitable for (and sometimes it's using them in only certain subtasks of a task). As mentioned, some tasks they absolutely excel at. Flipping a coin and deciding to use it or not is crude and unrealistic. Hard to come up with a reliable method though, I also think METR has it's glaring issues.
Same thing as all the "no-code" or "low-code" frameworks we see coming up from time to time.
No need to learn a programming language, wow, anyone can be a programmer now. A few projects come out of it, people marvel at how efficient it was, and it fizzles out and programmers continue writing code.
If anything, things like visual programming did more than AI does now. For games, if you want to see the shovelware, look at Flash, RPG maker, etc... not AI. On the business side of things, Excel is king. Can you get your vibe coded app out faster than by using Flash or Excel?
No one wants it? If there is no demand, then no one is going to become a supplier. You don’t even want the apps you’re dreaming of building, you wouldn’t use them. If you would use them, you would already be using apps that are available. It’s why developers claim huge benefits but the output is the same, there isn’t much demand for your average software company to push more output, the bottleneck is customer demand. If anything customer demand is falling because of AI. There is no platform that is blowing up for people to shovel shit to. Everything is saturated, there is no room for shovelware.
> We all know that the industry has taken a step back in terms of code quality by at least a decade. Hardly anyone tests anymore.
I see pseudo-scientific claims from both sides of this debate but this is a bit too far for me personally. "We all know" sounds like Eternal September [1] kind of reasoning. I've been in the industry about as long as the article author and I think he might be looking with rose-tinted glasses on the past. Every aging generation looks down at the new cohort as if they didn't go through the same growing pains.
But in defense of this polemic, and laying out my cards as an AI maximalist and massive proponent of AI coding, I've been wondering the same. I see articles all the time about people writing this and that software using these new tools and it so often is the case they never actually share what they built. I mean, I can understand if someone is heads-down cranking out amazing software using 10 Claude Code instances and raking in that cash. But not even to see one open source project that embraces this and demonstrates it is a bit suspicious.
I mean, where is: "I rewrote Redis from scratch using Claude Code and here is the repo"?
For me personally has been a good productivity tool. Mostly if I'm doing a side project, I can get up to speed with pretty much any language/framework and have it running in FAR less time than if I had to go through docs and set up my dev environment for said project.
There's really a lot to get from this "tool". Because in the end its a tool, and knowing how to use it is the most important aspect of it. It takes time, iteration, and practice to understand how to effectively use it
For me AI is a bell curve, and I'd expect the same for a lot of people. What needs to be defined is the measure by which to grade output. It should not be "lines of code" but "lines of good quality, maintainable, scalable, upgradable code".
When you consider this, "generate me a whole repo" is trivially garbage and not meeting the measurement metric. However having AI autocomplete "getUser(..." clearly IS productive.
Now is that a 0.1% increase, 1%, or 10%? That I can't tell you.
My hunch is that the amount of shovelware (or really, any software) is mostly proportional to the number of engineers wishing to work on that.
Even if AI made them more productive, it's on a person to decide what to build and how to ship, so the number (and desire) of humans is a bottleneck. Maybe at some point AI will start buying up domains and spinning up hundreds of random indiehacker micro-SaaS, but we're not there. Yet.
This argument is predicated on what might become an outdated idea of software as an asset. If I can quickly generate software from natural language to solve a very specific problem, that software isn't worth maintaining, let alone publishing or selling. Its value to people who aren't me is low, and its defensibility against being copied by someone else with an adequate coding agent is even lower.
Faster? At making what? Pipe dev/null and you'll get a lot of stuff fast.
What if someone came out with a study saying they had a tool to make the fastest doctors and lawyers? You'd say that doesn't even make sense, what kinds of doctors doing what kinds of work?
AI coding isn't some across the board, helps everyone do anything kind of tool.
That's all based on the assumption that if you can build something in 10% of the time it'd take you to build the same thing without AI, then you'll spend this 90% of your new spare time to build something else next. What if you don't and you'll just use that time to spend with your family? The data won't show it.
Until AI can understand business requirements and how they are implemented in code (including integrating with existing systems), it will continue to be overhyped. Devs will hate it, but in 10-15 years someone will figure out that the proper paradigm is to train the AI to build based off of something similar to Cucumber TDD with comprehensive example tables.
All these bearish claims about AI coding would hold weight if models were stuck permanently at the capabilities level they are now with no chance at improvement. This is very likely not the case given improvements over the past year, and even with diminishing returns models will be significantly more capable both independently and as a copilot in a year.
It's the cost. Full time serious agentic coding costs upwards of $100/day in Claude tokens (and Claude tokens are the only tokens worth even talking about). When this drops by 10x for a model at the level of quality and speed of Sonnet 4, it will change everything.
Excellent article. It'll be really interesting to look back on this in 5 years and ask the author to regenerate these charts again to see if there is any impact.
I've already experienced being handed a vibe coded app, which so far it's been a communication problem/code cleanliness eg. don't leave two versions of an app and not say which one is active. And the docs man so many docs/redundant/conflicting.
From the post, if AI was supposed to make everyone 25% more productive, then a 4 month project becomes a 3 month project. It doesn't become a 1 day project.
Was the author making games and other apps in 30 hours? Because that seems like a 4 month project?
* METR was at best a flawed study. Repo-familiarity and tool-unfamiliarity being the biggest points of critique, but far from the only one
* they assume that all code gets shipped as a product. Meanwhile, AI code has (at least in my field of view) led to a proliferation of useful-but-never-shipped one-off tools. Random dashboards to visualize complex queries, scripts to drive refactors, or just sheer joy like "I want to generate an SVG of my vacation trip and consume 15 data sources and give it a certain look".
* Their own self-experiment is not exactly statistically sound :)
That does leave the fact that we aren't seeing AI shovelware. I'm still convinced that's because commercially viable software is beyond the AI complexity horizon, not because AI isn't an extremely useful tool
It's much much worse in the Cybersecurity field. I wanted to share the anecdote here, too, because it's kind of fitting.
Somehow, in cyber, everyone believes that transformers will generate better answers than not to use the 10 most common passwords. It's like the whole knowledge about decision making theory, neural nets, GANs, LSTMs etc completely got wiped out and forgotten within less than 10 years.
I understand the awesomeness of LLMs while debugging and forensics (they are a really good rubberduck!), but apart from that they're pretty much useless because after two prompts they will keep forgetting if/elseif/else conditions, and to check those boundaries is the mission of the unlucky person that has to merge that slopcode later.
I don't understand how we got from TDD and test case based engineering to this bullshit. It's like everyone in power was the wrong person to be in that position in the first place, and that statistically no lead engineers ever will be a C-staff or SVP or whatever corporate manager level.
While the AI bubble is bursting, I will continue to develop with TDD practices to test my code. Which, in return, has the benefit of being able to use LLMs to create nice templates as a reasonable starting point.
While I like the self reflection from this article, I don't think his methodology adds up (pun intended). First there are two main axis where LLMs can make you more productive: speed & code quality. I think everyone is obsessed about the first one, but its less relevant.
My personal hypothesis is that when using LLMs, you are only faster if you would be doing things like boilerplate code. For the rest, LLMs don't really make you faster but can make your code quality higher, which means better implementation and caching bugs earlier. I am a big fan of giving the diff of a commit to an LLM that has a file MCP so he can search for files in the repo and having it point any mistakes I have made.
I too have been wondering whether the time I spend wrangling AI into getting it to do what I want, is greater than the time I'd spend if I just did it myself
Big Meh. Bad metric.
Phone apps were dead long before Ai came about.
Shovelware double so.
Most users have 40-80 apps installed and use 9 a day, 20 a month(1).
The shitty iOS subscription trend killed off the hobby of 'app collecting'.
Have I created large commercial Ai-coded projects? No.
Did I create 80+ useful tools in hours/days that I wouldn't have otherwise?
Hellz yeah!
Would I publish any of these on public github? Nope!
I don't have the time nor the inclination to maintain them.
There's just too many.
My shovelware "Apps" reside on my machine/our intranet or V0/lovable/bolt.
Roughly ~25% are in active daily use on my machine or in our company.
All tools and "apps" are saving us many hours each week.
I'm also rediscovering the joy of coding something useful, without writing a PRD for some intern.
Speaking of which. We no longer have an intern.
Hmm, I definitely have more issues with AI generated code that I wouldn’t have if I did it all manually, but the lack of typing may make up for the lost time itself.
Maybe developers are using it in a less visible way? In the past 6 months I've used AI for a lot of different things. Some highlights:
- Built a windows desktop app that scans local folders for videos and automatically transcribes the audio, summarises the content into a structured JSON format based on screenshots and subtitles, and automatically categorises each video. I used it on my PC to scan a couple of TB of videos. Has a relatively nice interface for browsing videos and searching and stores everything locally in SQLite. Did this in C# & Avalonia - which I've never used before. AI wrote about 75% of the code (about 28k LOC now).
- Built a custom throw-away migration tool to export a customers data from one CRM to import into another. Windows app with basic interface.
- Developed an AI process for updating a webform system that uses XML to update the form structure. This one felt like magic and I initially didn't think it would work, but it only took a minute to try. Some background - years ago I built a custom webform/checklist app for a customer. They update the forms very rarely so we never built an interface for making updates but we did write 2 stored procs to update forms - one outputs the current form as XML and another takes the same XML and runs updates across multiple tables to create a new version of the form. For changes, the customer sends me a spreadsheet with all the current form questions in one column and their changes in another. It's normally just wording changes so I go through and manually update the XML and import it, but this time they had a lot of changes - removing questions, adding new ones, combining others. They had a column with the label changes and another with a description of what they wanted (i.e. "New Question", "Update label", "Combine this with q1, q2 and q3", "remove this question"). The form has about 100 questions and the XML file is about 2500 lines long and defines each form field, section layout, conditional logic, grid display, task creation based on incorrect answers etc, so it's time consuming to make a lot of little changes like this. With no expectation of it working, I took a screenshot of the spreadsheet and the exported XML file and prompted the LLM to modify the XML based on the instructions in the spreadsheet and some basic guidelines. It did it close to perfect, even fixing the spelling mistakes the customer had missed while writing their new questions.
- Along with using it on a daily basis across multiple projects.
I've seen the stat that says developers "...thought AI was making them 20% faster, but it was actually making them 19% slower". Maybe I'm hoodwinking myself somehow, but it's been transformative for me in multiple ways.
> Github Copilot themselves say that initially, users only accept 29% of prompted coding suggestions (which itself is a wild claim to inefficiency, why would you publicize that?), but with six months of experience, users naturally get better at prompting and that grows to a whopping 34% acceptance rate. Apparently, 6 months of experience only makes you 5% better at prompting.
Or, alternatively, exposure to our robot overlords makes you less discerning, less concerned with, ah, whether the thing is correct or not.
(This _definitely_ seems to be a thing with LLM text generation, with many people seemingly not even reading the output before they post it, and I assume it's at least somewhat a thing for software as well.)
What the author is missing is the metric that matters more than shipping product: how much happier am I when my AI auto complete saves me typing and figures out what I'm trying to articulate for me. If devs using copilot are happier--and I am, at least--then that's value right there.
For experienced engineers, I'm seeing (internally in our company at least) a huge amount of caution and hesitancy to go all-in with AI. No one wants to end up maintaining huge codebases of slop code. I think that will shift over time. There are use cases where having quick low-quality code is fine. We need a new intuition about when to insist on handcrafted code, and when to just vibecode.
For non-experienced engineers, they currently hit a lot of complexity limits with getting a finished product to actually work, unless they're building something extremely simple. That will also shift - the range of what you can vibecode is increasing every year. Last year there was basically nothing that you could vibecode successfully, this year you can vibecode TODO apps and stuff like that. I definitely think that the App Store will be flooded in the coming future. It's just early.
Personally I have a side project where I'm using Claude & Codex and I definitely feel a measurable difference, it's about a 3x to 5x productivity boost IMO.
The summary.. Just because we don't see it yet, doesn't mean it's not coming.
Places where I got the most out of coding agents are:
- breaking through the analysis paralysis by creating the skeleton of a feature that I then rework (UI work is a good example)
- aggressive dev tooling for productivity on early stage projects, where the CI/CD pipeline is lacking and/or tools are clumsy.
(Related XKCD: https://xkcd.com/1205/)
Otherwise, I find most of my time is understanding the client requirements and making sure they don't want conflicting features – both of which are difficult to speedup with AI. Coding is actually the easy part and even if it was sped up 100x a consistent end-to-end improvement of 2x would be a big win (see Amdahl's law).
> These claims wouldn't matter if the topic weren't so deadly serious. Tech leaders everywhere are buying into the FOMO, convinced their competitors are getting massive gains they're missing out on. This drives them to rebrand as AI-First companies, justify layoffs with newfound productivity narratives, and lowball developer salaries under the assumption that AI has fundamentally changed the value equation.
How is this "deadly" serious? It's about software developers losing well-paid, comfortable jobs. It's even less serious if AI doesn't actually improve productivity, because they'll find new jobs in the future.
Pretty much the only future where AI will turn out "deadly serious" is if it shows human-level performance for most if not all desk jobs.
AI is the biggest threat to bs middle management/ba jobs in recent history. Those people panic and try to become "AI ambassadors/advocates" to survive. They are pushing the narrative so that they have answers and "metrics" of adoption to show to upper management and survive another round of layoffs. You can see how it works at YouTube, where someone decided for no good reason to "upscale" Shorts making them all look bad. This is done automatically, without asking for the creators' permission and you cannot turn it off. The results are crap, but who ever made that decision can brag about widespread adoption of AI and survive another annual review.
>” Now, I’ve spent a lot of money and weeks putting the data for this article together, processing tens of terabytes of data in some cases. So I hope you appreciate how utterly uninspiring and flat these charts are across every major sector of software development.”
Honestly this reminds me of some of the promises that were made when the american animation industry switched to 3D because it was "cheaper"
A modern animated Disney 3D animated film consistently costs over 100-200 million dollars while movies like Klaus were made for about 40 million. Japan still animates on PAPER.
At the end of the days new tools have their usecases but I think especially in creative domains (which software definitely is) old techniques aren't invalidated by the creation of new ones.
ZBrush still crushes all other sculpting apps with some very well written low level code and assembly. It doesn't even use the GPU for crying out loud. If you proposed that as your solution for a graphically intensive 3D app you'd be laughed at, but software based raseterization/simple ray tracing takes the cake here. It could handle 20 million polygons at buttery smooth framerates in 2007, and isn't burdened by the VRAM drought we're in.
Don't let people tell you new tools make the old useless.
This is a great question and the data points make a solid case.
I've been a "10xer" for 25 years. I've considered coding agents bullshit since my first encounter with Copilot. I work by having a clear mental map of every piece of my code and knowing exactly how everything works, to the smallest detail, and how it interacts with every other part.
Anyway, all that and a nickel. Yesterday I fired up Claude Code for the first time. I didn't ask it to build me a game or anything at a high level. Nor to evaluate an existing code base. No... I spent about 2 hours guiding it to create a front-end SPA framework that does what my own in-house SPA framework does on the front end, just to see how it would perform at that. I approved every request manually and interrupted every time I spotted a potential issue (which were many). I guided it on what functions to write and how they should affect the overall navigation flow, rendering flow, loading and error-handling.
In other words, I knew what I wanted to write to a T, because it's code I wrote in 2006 and have refactored and ported many times since then... about 370 commits worth to this basic artifact, give or take.
And it pretty much got there.
Would I have been able to prompt it to write a system like that if I hadn't written the system myself over and over again? Probably not. But it did discern the logical setup I was going for (which is not at all similar to what you're thinking if you're coming from React or another framework like that), and it wrote code that is almost identical in its control structures to what I wrote, without me having to do much besides tell it in plain English what should control what, how, when and in what order.
I'm still not convinced it would save me time on something totally new, that I didn't already know the final shape of.
But I do suspect that a reason all this "vibe coding" hasn't led to an explosion of vaporware is that "vibe coding" isn't being done by experienced coders. I suspect that if you're letting changes "flash across the screen" without reading them, that's most of the difference between a failed prompt and one that achieves the desired result.
Like, I saw it do things like create a factory class that took a string name and found the proper component to load. I said, "refactor that whole thing to a component definition interface with the name in it, make a static object of those and use that to determine what screen to load and all of its parameters." And it did, and it looked almost the same as what I wrote back in the day.
Idk. I would not want my job to become prompting an LLM. I like cracking my knuckles and writing code. But I think the mileage you get may be dictated by whether you are trying to use it as a general-purpose "make this for me" engine, for shovelware, in which case it will fail hard, versus whether you are using it as a stenographer translating a sentence of instructions into a block of control flow.
Where's the shovelware? Why AI coding claims don't add up
(mikelovesrobots.substack.com)759 points by dbalatero 3 September 2025 | 482 comments
Comments
This is my biggest problem right now. The types of problems I'm trying to solve at work require careful planning and execution, and AI has not been helpful for it in the slightest. My manager told me that the time to deliver my latest project was cut to 20% of the original estimate because we are "an AI-first company". The mass hysteria among SVPs and PMs is absolutely insane right now, I've never seen anything like it.
1. LLMs do not increase general developer productivity by 10x across the board for general purpose tasks selected at random.
2. LLMs dramatically increases productivity for a limited subset of tasks
3. LLMs can be automated to do busy work and although they may take longer in terms of clock time than a human, the work is effectively done in the background.
LLMs can get me up to speed on new APIs and libraries far faster than I can myself, a gigantic speedup. If I need to write a small bit of glue code in a language I do not know, LLMs not only save me time, but they make it so I don't have to learn something that I'll likely never use again.
Fixing up existing large code bases? Productivity is at best a wash.
Setting up a scaffolding for a new website? LLMs are amazing at it.
Writing mocks for classes? LLMs know the details of using mock libraries really well and can get it done far faster than I can, especially since writing complex mocks is something I do a couple times a year and completely forget how to do in-between the rare times I am doing it.
Navigating a new code base? LLMs are ~70% great at this. If you've ever opened up an over-engineered WTF project, just finding where HTTP routes are defined at can be a problem. "Yo, Claude, where are the route endpoints in this project defined at? Where do the dependency injected functions for auth live?"
Right tool, right job. Stop using a hammer on nails.
I think the "why" for this is that the stakes are high. The economy is trembling. Tech jobs are evaporating. There's a high anxiety around AI being a savior, and so, a demi-religion is forming among the crowd that needs AI to be able to replace developers/competency.
That said: I personally have gotten impressive results with AI, but you still need to know what you're doing. Most people don't (beyond the beginner -> intermediate range), and so, it's no surprise that they're flooding social media with exaggerated claims.
If you didn't have a superpower before AI (writing code), then having that superpower as a perceived equalizer is something that you will deploy all resources (material, psychological, etc) to ensuring that everyone else maintain the position that 1) superpower good, 2) superpower cannot go away 3) the superpower being fallible should be ignored.
Like any other hype cycle, these people will flush out, the midpoint will be discovered, and we'll patiently await the next excuse to incinerate billions of dollars.
On the other hand, I’ve lately seen it misused by less experienced engineers trying to implement bigger features who eagerly accept all it churns out as “good” without realizing the code it produced:
- doesn’t follow our existing style guide and patterns.
- implements some logic from scratch where there certainly is more than one suitable library, making this code we now own.
- is some behemoth of a PR trying to do all the things.
I think that there will be neurological fatigue occurring whereby if software engineers are not actively practicing problem-solving, discernment, and translation into computer code - those skills will atrophy...
Yee, AI is not the 2x or 10x technology of the future ™ is was promised to be. It may the case that any productivity boost is happening within existing private code bases. Even still, there should be a modest uptick in noticeably improved offer deployment in the market, which does not appear to be there.
In my consulting practice I am seeing this phenomenon regularly, wereby new founders or stir crazy CTOs push the use of AI and ultimately find that they're spending more time wrangling a spastic code base than they are building shared understanding and working together.
I have recently taken on advisory roles and retainers just to reinstill engineering best practices..
Perhaps these graphs show that management is indeed so finely tuned that they've managed to apply the AI revolution to keep productivity exactly flat while reducing expenses.
An alternative theory is that writing code was never the bottleneck of releasing software. The exploration of what it is you're building and getting it on a platform takes time and effort.
On the other hand, yeah, it's really easy to 'hold it wrong' with AI tools. Sometimes I have a great day and think I've figured it out. And then the next day, I realize that I'm still holding it wrong in some other way.
It is philosophically interesting that it is so hard to understand what makes building software products hard. And how to make it more productive. I can build software for 20 years and still feel like I don't really know.
Just today I built a shovelware CLI that exports iMessage archives into a standalone website export. Would have taken me weeks. I'll probably have it out as a homebrew formula in a day or two.
I'm working on an iOS app as well that's MUCH further along than it would be if I hand-rolled it, but I'm intentionally taking my time with it.
Anyway, the post's data mostly ends in March/April which is when generative AI started being useful for coding at all (and I've had Copilot enabled since Nov 2022)
"So, here’s labor productivity growth over the 25 years following each date on the horizontal axis [...] See the great productivity boom that followed the rise of the internet? Neither do I. [...] Maybe the key point is that nobody is arguing that the internet has been useless; surely, it has contributed to economic growth. The argument instead is that its benefits weren’t exceptionally large compared with those of earlier, less glamorous technologies."¹
"On the second, history suggests that large economic effects from A.I. will take longer to materialize than many people currently seem to expect [...] And even while it lasted, productivity growth during the I.T. boom was no higher than it was during the generation-long boom after World War II, which was notable in the fact that it didn’t seem to be driven by any radically new technology [...] That’s not to say that artificial intelligence won’t have huge economic impacts. But history suggests that they won’t come quickly. ChatGPT and whatever follows are probably an economic story for the 2030s, not for the next few years."²
¹ https://www.nytimes.com/2023/04/04/opinion/internet-economy....
² https://www.nytimes.com/2023/03/31/opinion/ai-chatgpt-jobs-e...
Background: I'm building a python package side project which allows you to encode/decode messages into LLM output.
Receipts: the tool I'm using creates a markdown that displays every prompt typed, and every solution generated, along with summaries of the code diffs. You can check it out here: https://github.com/sutt/innocuous/blob/master/docs/dev-summa...
Specific example: Actually used a leet-code style algorithms implementation of memo-ization for branching. This would have taken a couple of days to implement by hand, but it took about 20 minutes to write the spec and 20 minutes to review solutions and merge the solution generated. If you're curious you can see this diff generated here: https://github.com/sutt/innocuous/commit/cdabc98
On the other hand, I do understand that the things the LLMs are really great at is not actually all that spectacular to monetize ... and so as a result we have all these snake oil salesmen on every corner boasting about nonsensical vibecoding achievements, because that's where the real money would be ... if it were really true ... but it is not.
The reason it doesn't show up online is that I mostly write software for myself and for work, with the primary goal of making things better, not faster. More tooling, better infra, better logging, more prototyping, more experimentation, more exploration.
Here's my opensource work: https://github.com/orgs/go-go-golems/repositories . These are not just one-offs (although there's plenty of those in the vibes/ and go-go-labs/ repositories), but long-lived codebases / frameworks that are building upon each other and have gone through many many iterations.
Where i expect to see a lot of those metrics of feeling fast come from, is from people who may have less coding experience, and with AI are coding way above their level.
My brother in law asks for a nice product website, i just feed his business plan into a LLM, do some fine tuning on the results, and have a good looking website in a hour time. If i did it myself manually, just take me behind a barn as those jobs are so boring and take for ages. But i know that website design is a weakness of mine.
That is the power of LLMs. Turn out quick code, maybe offer some suggestion you did not think about, but ... it also eats time! Making your prompts so that the LLM understands, waiting for the result, ... waiting ... ok, now check the result, can you use it? O no, it did X, Y, Z wrong. Prompt again ... and again. And this is where your productivity goes to die.
So when you compare a pool of developer feedback, your going to get a broad "it helps a lot", "some", "is worse then my code", ... mix in with the prompting, result delays etc...
It gets even worse with Agent / Vibe coding, as you just tend to be waiting, 5, 10min for changed to be done. You need to review them, test them, ... o no, the LLM screwed something up again. O no, it removed 50% of my code. Hey, where did my comments go. And we are back to a loss of time.
LLMs are a tool... But after a lot of working with them, my opinion is to use them when needed but do not depend on them for everything. I sometimes look with cow eyes when people say they are coding so much with LLMs and spending 200, or more bucks per month.
They can be powerful tools, but i feel that some folks become so over dependent on them. And worst is my feeling that our juniors are going to be in a world of hurt, if their skills are more LLM monkey coding (or vibe coding), then actually understanding how to code (and the knowledge behind the actual programming languages and systems).
Let's take the following scenario for the sake of argument: a codebase with well-defined AGENTS.md, referencing good architecture, roadmap, and product documentation, and with good test coverage, much of which was written by an LLM and lightly reviewed and edited by a human. Let's say for the sake of argument that the human is not enjoying 10x productivity despite all this scaffolding.
Is it still worthwhile to use LLM tooling? You know what, I think a lot of companies would say yes. There are way too many companies whose codebases lack testing and documentation, that are too difficult to on-board new engineers and have too high risk if the original engineers are lost. The simple fact that LLMs, to be effective, force the adaptation of proper testing and documentation is a huge win for corporate software.
When I tried to code again, I found I didn't really have the patience for it -- having to learn new frameworks, APIs, languages, tricky little details, I used to find it engrossing: it had become annoying.
But with tools like Claude Code and my knowledge about how software should be designed and how things should work, I am able to develop big systems again.
I'm not 20% more productive than I was. I'm not 10x more productive than I was either. I'm infinity times more productive because I wouldn't be doing it at all otherwise, realistically: I'd either hire someone to do it, or not do it, if it wasn't important enough to go through the trouble to hire someone.
Sure, if you are a great developer and spend all day coding and love it, these tools may just be a hindrance. But if you otherwise wouldn't do it at all they are the opposite of that.
I believe it's a productivity boost, but only to a small part of my job. The boost would be larger if only had to build proof-of-concepts or hobby projects that don't need to be reliable in prod, and don't require feedback and requirements from many other people.
Just spit balling here, but it sure feels similar.
I have the same experience as OP, I use AI every day including coding agents, I like it, it's useful. But it's not transformative to my core work.
I think this comes down to the type of work you're doing. I think the issue is that most software engineering isn't in fields amenable to shovelware.
Most of us either work in areas where the coding is intensely brownfield. AI is great but not doubling anyone's productivity. Or, in areas where the productivity bottlenecks are nowhere near the code.
That said, I’m skeptical that AI is as helpful for commercial software. It’s been great for in automating my workflow because I suck at shell scripting and AI is great at it. But most of the code I write I honestly halfway don’t know what I’m going to write until I write it. The prompt itself is where my thinking goes - so the time savings would be fairly small, but I also think I’m fairly skilled (except at scripting).
The same people who are willing to go through all the steps to release an application online are also willing to go through the extra effort of writing their own code. The code is actually the easy part compared to the rest of it... always has been.
As an analogy, can you imagine being a startup that hired a developer, and months later finding out the bulk of the new Web app they "coded" for you was actually copy&pasted open source code, loosely obfuscated, which they were passing it off as something they developed, and to which the company had IP rights?
You'd immediately convene the cofounders and a lawyer, about how to make this have never happened.
First you realize that you need to hand the lawyer the evidence (against the employee), and otherwise remove all traces of that code and activity from the company.
Simultaneously, you need to get real developers started rushing to rewrite everything without obvious IP taint.
Then one of you will delicately ask whether firing and legal action against the employee is sufficient, or whether the employee needs to sleep with the fishes to keep them quiet.
The lawyer will say this kind of situation isn't within the scope of their practice, but here's the number of a person they refer to only as 'the specialist'.
Soon, not only are you losing the startup, and the LLC is being pierced to go after your personal assets, but you're also personally going to prison. Because you were also too cheap to pay the professional fee for 'the specialist', and you asked ChatGPT to make the employee have a freak industrial shredder accident.
All this because you tried to cheap out, and spend $20 or $200 on getting some kind of code to appear in your repo, while pretending you didn't know where it came from.
From my experience, it's much easier to get an LLM to generate code for a React/Tailwind CSS web app than a mobile app, and that's why we're seeing so many of these apps showing up in the SaaS space.
Well... no significant effects show except for a few projects. It was really hard torturing the data to come to my manager's desired conclusion.
Its sometimes helpful when writing an email but otherwise has not touched any of my productive work.
Changing domain to writing and images and video you can see LinkedIn is awash with everyone generating everything by LLMs. The posting cadence has quickened too as people shout louder to raise their AI assisted voice over other people’s.
We’ve all seen and heard the AI images and video tsunami
So why not software (yet but soon)??
Firstly, Software often has a function and AI tool creations cannot make that work. Lovable/Bolt etc are too flakey to live up to their text to app promises. A shedload of horror debugging or a lottery win of luck is required to fashion an app out of that. This will improve over time but the big question is, by enough?
And secondly, like on LinkedIn above: perhaps the standards of the users will drop? LinkedIn readers now tolerate the llm posts, it is not a mark of shame. Will the same reduction in standards in software users open the door to good-enough shovelware?
It'll increase incremental developments manyfold. A non-programmer spending a few hours on AI to make their workflow better and easier and faster. This is what everyone here keeps missing. It's not the programmers that should be using AI; it's 'regular' people.
On my computer. Once I've built something I often realize the problems with the idea and abandon the project, so I'm never shipping it.
My question to HN is... can LLMs do this? Can they convert all the unsafe c-string invocations to safe. Can they replace system calls with posix calls. Can they wrap everything in a smart pointer and make sure that mutex locks are added where needed.
There’s a lot more code being written now that’s not counted in these statistics. A friend of mine vibe coded a writing tool for himself entirely using Gemini canvas.
I regularly vibe code little analyses or scripts in ChatGPT which would have required writing code earlier.
None of these are counted in these statistics.
And yes AI isn’t quite good enough to super charge app creation end to end. Claude has only been good for a few months. That’s hardly enough time for adoption !
This would be like analysing the impact of languages like Perl or Python on software 3 months after their release.
I'd also be curious how the numbers look for AI generated videos/images, because social media and youtube seem absolutely flooded with the stuff. Maybe it's because the output doesn't have to "function" like code does?
Grammatical nit: The phrase is "neck and neck", like where two race horses are very close in progress
But my workflow is anything but "let her rip". It's very calculated, orderly, just like mastering any other tool. I'm always in the loop. I can't imagine someone without serious experience getting good stuff, and when things go bad, oh boy you're bringing a firehose of crap into your org.
I have a junior programmer who's a bright kid but lacking a lot of depth. Got him a Cursor subscription, tracking his code closely via PRs and calling out the BS but we're getting serious work done.
I just can't see how this new situation calls for less programmers. It will just bring about more software, surely more capable software after everyone adjusts.
So who pays for AIs for developers? Mostly corpos. And the speed of individual developer was never a limiting factor in corpos. Average corporate development was always 10 times slower than indie. So even doubling it won't make any impression.
I don't know if I'm faster with AI at a specific task, but I know that I'm doing things I wouldn't touch because I hate the tedium. And I'm doing them while cooking and eating dinner and thinking about wider context and next things to come. So for me it feels worth it.
I think it might be something like with cars and safety. Any car safety improvements are going to be offset by the drivers driving faster and more recklessly. So maybe any speed improvements that AI might make for the project is nullified by developers doing things they would just skip without it.
> The impact on human lives is incredible. People are being fired because they’re not adopting these tools fast enough. People are sitting in jobs they don’t like because they’re afraid if they go somewhere else it’ll be worse. People are spending all this time trying to get good at prompting and feeling bad because they’re failing.
The shovelware was always there. And it always will be. But that's doesn't mean it's splurting out faster, because that's not what AI does. Hell, if anything I expect that there's less visible shovelware because when it does get created, it's less obvious (and perhaps higher quality).
At some point, the quality of uninspired projects will be lifted up by the baseline of quality that mainstream AI allows. At what point is that "high enough that we can't tell what's garbage"? We've perhaps found ourselves at or around that point.
For example, in Steam, it costs $100 to release a game. You may extend your game with what's called a DLC and that costs $0 to release. If I were to build shovelware with especially with AI-generated content, I'd more keen to make a single game with a bunch of DLC.
For game development, integration of AI into engines is another barrier. There aren't that many choices of engines that gives AI an interface to work with. The obvious interface is games that can be entirely build with code (e.g., pygame; even Godot is a big stretch)
Where I have found them very useful are for one-off scripts and stuff I need done quick and dirty, that isn't too complex and easily verifiable (so I can catch the mistakes it makes, and it does make them!), and especially in languages I don't know that well or don't like (i.e., bash, powershell, javascript)
No need to learn a programming language, wow, anyone can be a programmer now. A few projects come out of it, people marvel at how efficient it was, and it fizzles out and programmers continue writing code.
If anything, things like visual programming did more than AI does now. For games, if you want to see the shovelware, look at Flash, RPG maker, etc... not AI. On the business side of things, Excel is king. Can you get your vibe coded app out faster than by using Flash or Excel?
I see pseudo-scientific claims from both sides of this debate but this is a bit too far for me personally. "We all know" sounds like Eternal September [1] kind of reasoning. I've been in the industry about as long as the article author and I think he might be looking with rose-tinted glasses on the past. Every aging generation looks down at the new cohort as if they didn't go through the same growing pains.
But in defense of this polemic, and laying out my cards as an AI maximalist and massive proponent of AI coding, I've been wondering the same. I see articles all the time about people writing this and that software using these new tools and it so often is the case they never actually share what they built. I mean, I can understand if someone is heads-down cranking out amazing software using 10 Claude Code instances and raking in that cash. But not even to see one open source project that embraces this and demonstrates it is a bit suspicious.
I mean, where is: "I rewrote Redis from scratch using Claude Code and here is the repo"?
1. https://en.wikipedia.org/wiki/Eternal_September
There's really a lot to get from this "tool". Because in the end its a tool, and knowing how to use it is the most important aspect of it. It takes time, iteration, and practice to understand how to effectively use it
When you consider this, "generate me a whole repo" is trivially garbage and not meeting the measurement metric. However having AI autocomplete "getUser(..." clearly IS productive.
Now is that a 0.1% increase, 1%, or 10%? That I can't tell you.
https://www.apple.com/app-store/
https://play.google.com
https://tiktok.com
https://pinterest.com
https://youtube.com
1. Only a handful of devs use LLMs
2. For every developer getting less productive with LLMs, there must be developers that get more productive to keep the trend
Even if AI made them more productive, it's on a person to decide what to build and how to ship, so the number (and desire) of humans is a bottleneck. Maybe at some point AI will start buying up domains and spinning up hundreds of random indiehacker micro-SaaS, but we're not there. Yet.
What if someone came out with a study saying they had a tool to make the fastest doctors and lawyers? You'd say that doesn't even make sense, what kinds of doctors doing what kinds of work?
AI coding isn't some across the board, helps everyone do anything kind of tool.
Maybe sometime soon we'll stop strawmanning this.
I have prompt docs precisely on SOLID, TDD and all kinds of design patterns… but yes I see a lot of untested code these days.
AI has been incredibly helpful at analyzing existing, unknown to me, projects; basically for debugging and searching in these repo’s.
Archived here: https://archive.is/WN3iu
Was the author making games and other apps in 30 hours? Because that seems like a 4 month project?
* METR was at best a flawed study. Repo-familiarity and tool-unfamiliarity being the biggest points of critique, but far from the only one
* they assume that all code gets shipped as a product. Meanwhile, AI code has (at least in my field of view) led to a proliferation of useful-but-never-shipped one-off tools. Random dashboards to visualize complex queries, scripts to drive refactors, or just sheer joy like "I want to generate an SVG of my vacation trip and consume 15 data sources and give it a certain look".
* Their own self-experiment is not exactly statistically sound :)
That does leave the fact that we aren't seeing AI shovelware. I'm still convinced that's because commercially viable software is beyond the AI complexity horizon, not because AI isn't an extremely useful tool
Somehow, in cyber, everyone believes that transformers will generate better answers than not to use the 10 most common passwords. It's like the whole knowledge about decision making theory, neural nets, GANs, LSTMs etc completely got wiped out and forgotten within less than 10 years.
I understand the awesomeness of LLMs while debugging and forensics (they are a really good rubberduck!), but apart from that they're pretty much useless because after two prompts they will keep forgetting if/elseif/else conditions, and to check those boundaries is the mission of the unlucky person that has to merge that slopcode later.
I don't understand how we got from TDD and test case based engineering to this bullshit. It's like everyone in power was the wrong person to be in that position in the first place, and that statistically no lead engineers ever will be a C-staff or SVP or whatever corporate manager level.
While the AI bubble is bursting, I will continue to develop with TDD practices to test my code. Which, in return, has the benefit of being able to use LLMs to create nice templates as a reasonable starting point.
My personal hypothesis is that when using LLMs, you are only faster if you would be doing things like boilerplate code. For the rest, LLMs don't really make you faster but can make your code quality higher, which means better implementation and caching bugs earlier. I am a big fan of giving the diff of a commit to an LLM that has a file MCP so he can search for files in the repo and having it point any mistakes I have made.
Most users have 40-80 apps installed and use 9 a day, 20 a month(1). The shitty iOS subscription trend killed off the hobby of 'app collecting'.
Have I created large commercial Ai-coded projects? No. Did I create 80+ useful tools in hours/days that I wouldn't have otherwise? Hellz yeah!
Would I publish any of these on public github? Nope! I don't have the time nor the inclination to maintain them. There's just too many.
My shovelware "Apps" reside on my machine/our intranet or V0/lovable/bolt. Roughly ~25% are in active daily use on my machine or in our company. All tools and "apps" are saving us many hours each week.
I'm also rediscovering the joy of coding something useful, without writing a PRD for some intern. Speaking of which. We no longer have an intern.
(1) https://buildfire.com/app-statistics/
- Built a windows desktop app that scans local folders for videos and automatically transcribes the audio, summarises the content into a structured JSON format based on screenshots and subtitles, and automatically categorises each video. I used it on my PC to scan a couple of TB of videos. Has a relatively nice interface for browsing videos and searching and stores everything locally in SQLite. Did this in C# & Avalonia - which I've never used before. AI wrote about 75% of the code (about 28k LOC now).
- Built a custom throw-away migration tool to export a customers data from one CRM to import into another. Windows app with basic interface.
- Developed an AI process for updating a webform system that uses XML to update the form structure. This one felt like magic and I initially didn't think it would work, but it only took a minute to try. Some background - years ago I built a custom webform/checklist app for a customer. They update the forms very rarely so we never built an interface for making updates but we did write 2 stored procs to update forms - one outputs the current form as XML and another takes the same XML and runs updates across multiple tables to create a new version of the form. For changes, the customer sends me a spreadsheet with all the current form questions in one column and their changes in another. It's normally just wording changes so I go through and manually update the XML and import it, but this time they had a lot of changes - removing questions, adding new ones, combining others. They had a column with the label changes and another with a description of what they wanted (i.e. "New Question", "Update label", "Combine this with q1, q2 and q3", "remove this question"). The form has about 100 questions and the XML file is about 2500 lines long and defines each form field, section layout, conditional logic, grid display, task creation based on incorrect answers etc, so it's time consuming to make a lot of little changes like this. With no expectation of it working, I took a screenshot of the spreadsheet and the exported XML file and prompted the LLM to modify the XML based on the instructions in the spreadsheet and some basic guidelines. It did it close to perfect, even fixing the spelling mistakes the customer had missed while writing their new questions.
- Along with using it on a daily basis across multiple projects.
I've seen the stat that says developers "...thought AI was making them 20% faster, but it was actually making them 19% slower". Maybe I'm hoodwinking myself somehow, but it's been transformative for me in multiple ways.
Or, alternatively, exposure to our robot overlords makes you less discerning, less concerned with, ah, whether the thing is correct or not.
(This _definitely_ seems to be a thing with LLM text generation, with many people seemingly not even reading the output before they post it, and I assume it's at least somewhat a thing for software as well.)
I didn't know Mike Judge was such a polymath!
For experienced engineers, I'm seeing (internally in our company at least) a huge amount of caution and hesitancy to go all-in with AI. No one wants to end up maintaining huge codebases of slop code. I think that will shift over time. There are use cases where having quick low-quality code is fine. We need a new intuition about when to insist on handcrafted code, and when to just vibecode.
For non-experienced engineers, they currently hit a lot of complexity limits with getting a finished product to actually work, unless they're building something extremely simple. That will also shift - the range of what you can vibecode is increasing every year. Last year there was basically nothing that you could vibecode successfully, this year you can vibecode TODO apps and stuff like that. I definitely think that the App Store will be flooded in the coming future. It's just early.
Personally I have a side project where I'm using Claude & Codex and I definitely feel a measurable difference, it's about a 3x to 5x productivity boost IMO.
The summary.. Just because we don't see it yet, doesn't mean it's not coming.
- breaking through the analysis paralysis by creating the skeleton of a feature that I then rework (UI work is a good example)
- aggressive dev tooling for productivity on early stage projects, where the CI/CD pipeline is lacking and/or tools are clumsy. (Related XKCD: https://xkcd.com/1205/)
Otherwise, I find most of my time is understanding the client requirements and making sure they don't want conflicting features – both of which are difficult to speedup with AI. Coding is actually the easy part and even if it was sped up 100x a consistent end-to-end improvement of 2x would be a big win (see Amdahl's law).
How is this "deadly" serious? It's about software developers losing well-paid, comfortable jobs. It's even less serious if AI doesn't actually improve productivity, because they'll find new jobs in the future.
Pretty much the only future where AI will turn out "deadly serious" is if it shows human-level performance for most if not all desk jobs.
This is what I always look for. Haven’t found one salient success story with a claim for success.
We might be doing just that now.
The best way to increase your ROI is to fire all your employees. How do we know we're not in the mid-release-cycle of that right now?
I'd guess game levels and assets are becoming ai slop as we speak.
>” Now, I’ve spent a lot of money and weeks putting the data for this article together, processing tens of terabytes of data in some cases. So I hope you appreciate how utterly uninspiring and flat these charts are across every major sector of software development.”
A modern animated Disney 3D animated film consistently costs over 100-200 million dollars while movies like Klaus were made for about 40 million. Japan still animates on PAPER.
At the end of the days new tools have their usecases but I think especially in creative domains (which software definitely is) old techniques aren't invalidated by the creation of new ones.
ZBrush still crushes all other sculpting apps with some very well written low level code and assembly. It doesn't even use the GPU for crying out loud. If you proposed that as your solution for a graphically intensive 3D app you'd be laughed at, but software based raseterization/simple ray tracing takes the cake here. It could handle 20 million polygons at buttery smooth framerates in 2007, and isn't burdened by the VRAM drought we're in.
Don't let people tell you new tools make the old useless.
I've been a "10xer" for 25 years. I've considered coding agents bullshit since my first encounter with Copilot. I work by having a clear mental map of every piece of my code and knowing exactly how everything works, to the smallest detail, and how it interacts with every other part.
Anyway, all that and a nickel. Yesterday I fired up Claude Code for the first time. I didn't ask it to build me a game or anything at a high level. Nor to evaluate an existing code base. No... I spent about 2 hours guiding it to create a front-end SPA framework that does what my own in-house SPA framework does on the front end, just to see how it would perform at that. I approved every request manually and interrupted every time I spotted a potential issue (which were many). I guided it on what functions to write and how they should affect the overall navigation flow, rendering flow, loading and error-handling.
In other words, I knew what I wanted to write to a T, because it's code I wrote in 2006 and have refactored and ported many times since then... about 370 commits worth to this basic artifact, give or take.
And it pretty much got there.
Would I have been able to prompt it to write a system like that if I hadn't written the system myself over and over again? Probably not. But it did discern the logical setup I was going for (which is not at all similar to what you're thinking if you're coming from React or another framework like that), and it wrote code that is almost identical in its control structures to what I wrote, without me having to do much besides tell it in plain English what should control what, how, when and in what order.
I'm still not convinced it would save me time on something totally new, that I didn't already know the final shape of.
But I do suspect that a reason all this "vibe coding" hasn't led to an explosion of vaporware is that "vibe coding" isn't being done by experienced coders. I suspect that if you're letting changes "flash across the screen" without reading them, that's most of the difference between a failed prompt and one that achieves the desired result.
Like, I saw it do things like create a factory class that took a string name and found the proper component to load. I said, "refactor that whole thing to a component definition interface with the name in it, make a static object of those and use that to determine what screen to load and all of its parameters." And it did, and it looked almost the same as what I wrote back in the day.
Idk. I would not want my job to become prompting an LLM. I like cracking my knuckles and writing code. But I think the mileage you get may be dictated by whether you are trying to use it as a general-purpose "make this for me" engine, for shovelware, in which case it will fail hard, versus whether you are using it as a stenographer translating a sentence of instructions into a block of control flow.