Whether it's vibe coding, agentic coding, or copy pasting from the web interface to your editor, it's still sad to see the normalization of private (i.e., paid) LLM models. I like the progress that LLMs introduce and I see them as a powerful tool, but I cannot understand how programmers (whether complete nobodies or popular figures) dont mind adding a strong dependency on a third party in order to keep programming. Programming used to be (and still is, to a large extent) an activity that can be done with open and free tools. I am afraid that in a few years, that will no longer be possible (as in most programmers will be so tied to a paid LLM, that not using them would be like not using an IDE or vim nowadays), since everyone is using private LLMs. The excuse "but you earn six figures, what' $200/month to you?" doesn't really capture the issue here.
I'm going a little offtopic here, but I disagree with the OPs use of the term "PhD-level knowledge", although I have a huge amount of respect for antirez (beside that we are born in the same island).
This phrasing can be misleading and points to a broader misunderstanding about the nature of doctoral studies, which it has been influenced by the marketing and hype discourse surrounding AI labs.
The assertion that there is a defined "PhD-level knowledge" is pretty useless. The primary purpose of a PhD is not simply to acquire a vast amount of pre-existing knowledge, but rather to learn how to conduct research.
I think all conversations about coding with LLMs, vibe coding, etc. need to note the domain and choice of programming language.
IMHO those two variables are 10x (maybe 100x) more explanatory than any vibe coding setup one can concoct.
Anyone who is befuddled by how the other person {loves, hates} using LLMs to code should ask what kind of problem they are working on and then try to tackle the same problem with AI to get a better sense for their perspective.
Until then, every one of these threads will have dozens of messages saying variations of "you're just not using it right" and "I tried and it sucks", which at this point are just noise, not signal.
Have used Claude's GitHub action quite a bit now (10-20 issue implementations, a bit more PR reviews), and it is hit and miss so agree with the enhanced coding rather than just letting it run loose.
When the change is very small, self-contained feature/refactor it can mostly work alone, if you have tests that cover the feature then it is relatively safe (and you can do other stuff because it is running in an action, which is a big plus...write the issue and you are done, sometimes I have had Claude write the issue too).
When it gets to a more medium size, it will often produce something that will appear to work but actually doesn't. Maybe I don't have test coverage and it is my fault but it will do this the majority of the time. I have tried writing the issue myself, adding more info to claude.md, letting claude write the issue so it is a language it understands but nothing works, and it is quite frustrating because you spend time on the review and then see something wrong.
And anything bigger, unsurprisingly, it doesn't do well.
PR reviews are good for small/medium tasks too. Bar is lower here though, much is useless but it does catch things I have missed.
So, imo, still quite a way from being able to do things independently. For small tasks, I just get Claude to write the issue, and wait for the PR...that is great. For medium (which is most tasks), I don't need to do much actual coding, just directing Claude...but that means my productivity is still way up.
I did try Gemini but I found that when you let it off the leash and accept all edits, it would go wild. We have Copilot at work reviewing PRs, and it isn't so great. Maybe Gemini better on large codebases where, I assume, Claude will struggle.
I have found that if I ask the LLM to first _describe_ to me what it wants to do without writing any code, then the subsequent code generated has much higher quality. I will ask for a detailed description of the things it wants to do, give it some feedback and after a couple of iterations, tell it to go ahead and implement it.
Unlike OP, from my still limited but intense month or so diving into this topic so far, I had better luck with Gemini 2.5 PRO and Opus 4 on more abstract level like architecture etc. and then dealing input to Sonnet for coding. I found 2.5 PRO, and to a lesser degree Opus, were hit or miss; A lot of instances of them circling around the issue and correcting itself when coding (Gemini especially so), whereas Sonnet would cut to the chase, but needed explicit take on it to be efficient.
Glad to see my experience is reflected elsewhere. I've found Gemini 2.5 PRO to be the best bang-for-buck model: good reasoning and really cheap to run (counts as 1 request in cursor, where opus can blow my quotas out of the water). Code style works well for me too, its "basic" but thats what I want. If I have only one model to take to my deserted island this is the one I'd use right now.
For the heady stuff, I usually use o3 (but only to debug, its coding style is a bit weird for me), saving Opus 4 for when I need the "big guns".
I don't have Claude Code (cursor user for now), but if I did I'd probably use Opus more.
Having done a few months of this new “job” of agentic coding I strongly agree with everything in this post.
Frontier LLMs are easiest to work with for now. Open models _will_ catch up. We can be excited for that future.
You are able to learn things from LLMs, and you can ask them for recommendations for an approach to implement something. Or just tell your LLM the approach to take. Sometimes it overcomplicates things. You’ll develop an instinct for when that’s likely. You can head off the overcomplication ahead of time or ask for refactorings after the initial cut is built for you. After a while you get an instinct for which way will get the work done soonest. Most fascinating of all, it’ll all change again with the next round of frontier models.
You don’t need frontier models for every task. For instance I’ve been positively surprised by Github Copilot for straightforward features and fixes. When it’s obvious how to implement, and you won’t need to go back and forth to nail down finer design details, getting an initial PR from Copilot is a great starting place.
To everyone starting out, enjoy the ride, know that none of us know what we’re doing, and share what you learn along the way!
IMO Claude code was a huge step up. We have a large and well structured python code base revolving mostly around large and complicated adapter pattern Claude is almost fully capable to implement a new adapter if given the right prompt/resources.
Can anyone recommend a workflow / tools that accomplishes a slightly more augmented version of antirez’ workflow & suggestions minus the copy-pasting?
I am on board to agree that pure LLM + pure original full code as context is the best path at the moment, but I’d love to be able to use some shortcuts like quickly applying changes, checkpoints, etc.
My persistent (and not unfounded?) worry is that all the major tools & plugins (Cursor, Cline/Roo) all play games with their own sub-prompts and context “efficiency”.
I find agentic coding to be best when using one branch per conversation. Even if that conversation is only a single bugfix, branch it. Then do 2 or 3 iterations of that same conversation across multiple branches and choose the best result of the 3 and destroy the other two.
“Always be part of the loop by moving code by hand from your terminal to the LLM web interface: this guarantees that you follow every process. You are still the coder, but augmented.”
I agree with this, but this is why I use a CLI. You can pipe files instead of copying and pasting.
Overall strong piece of writing. This part resonated with me as aptly described:
> more/better in the same time used in the past — which is what I do), when left alone with nontrivial goals they tend to produce fragile code bases that are larger than needed, complex, full of local minima choices, suboptimal in many ways.
And this part felt like a "bitter lesson" anti-pattern:
> Avoid any RAG that will show only part of the code / context to the LLM. This destroys LLMs performance. You must be in control of what the LLM can see when providing a reply.
Ultimately I think cli agents like claude-code and gemini-cli and aider will be controlling the context dynamically, and the human should not be spending premature optimization time on this activity.
My question on all of the “can’t work with big codebases” is how would a codebase that was designed for an LLM look like? Composed of many many small functions that can be composed together?
Lovely post @antirez. I like the idea that LLMs should be directly accessing my codebase and there should be no agents in between. Basically no software that filters what the LLM sees.
That said, are there tools that make going through a codebase easier for LLMs? I guess tools like Claude Code simply grep through the codebase and find out what Claude needs. Is that good enough or are there tools which keep a much more thorough view of the codebase?
What is the overall feedback loop with LLMs writing code? Do they learn as they go like we do? Do they just learn from reading code on GitHub? If the latter, what happens as less and less code gets written by human experts? Do the LLMs then stagnate in their progress and start to degrade? Kind of like making analog copies of analog copies of analog copies?
I find it serendipitous that Antirez is into LLM based coding, because the attention to detail in Redis means all the LLMs have trained extensively on the Redis codebase.
Something that was meant for humans, has now been consumed by AI and he is being repaid for that openness in a way. It comes full circle. Consistency, clarity and openness win again.
Contrary to this post, I think the AI agents, particularly the online interface of OpenAI's Codex to be a massive help.
One example, I had a PR up that was being reviewed by a colleague. I was driving home from vacation when I saw the 3-4 comments come in. I read them when we stopped for gas, went to OpenAI / codex on my phone, dictated what I needed and made it PR to my branch. Then got back on the road & PR'd it. My colleague saw the PR, agreed and merged it in.
I think of it as having a ton of interns, the AI is about the same quality. It can help to have them, but they often get stuck, need guidance, etc. If you treat the AI like an intern and explain what you need it can often produce good results; just be prepared to fallback to coding quickly.
I currently use LLMs as a glorified Stack Overflow. If I want to start integrating an LLM like Gemini 2.5 PRO into my IDE (I use Visual Studio Code), whats the best way to do this? I don't want to use a platform like Cursor or Claude Code which takes me away from my IDE.
This is possibly the first HN AI article that actually matches my experience - where the models are good enough for small pieces other people have done once, or for where you might otherwise write a macro, but for anything beyond the scope of a single file write shitty code; and regardless, always have to be hand-held.
It’s a far cry away from “vibe code everything”, “this will eliminate jobs” that the current hype train is pushing, despite clearly using the agentic approach with large context provieee by Opus.
Thank you very much this is exactly my experience. I sometimes let it vibe code frontend features that area easy to test in an already typed code base (add a field to this form), but most of the time its my sparring partner to review my code and evaluate all options. While it often recommends bullox or has logical flaws it helps me to do the obvious thing and to not miss a solution! Sometimes we have fancy play syndrome and want to code the complicated thing because of a fundamental leak we have. LLMS done a great job of reducing those of my flaws.
> Despite the large interest in agents that can code alone, right now you can maximize your impact as a software developer by using LLMs in an explicit way, staying in the loop.
I think this is key here. Whoever has the best UX for this (right now, it's Cursor IMO) will get the bulk of the market share. But the switching costs are so low for this set of tooling that we'll see a rapid improvement in the products available, and possibly some new entrants.
A good way to get a model to answer questions about a codebase without overwhelming it or exceeding its token count is to:
1. just give it the directory structure
2. ask it questions based on that
3. after it answers a question ask it if there are any specific code files it needs to better answer the question you asked
4. attach only those files so it can confirm its answer and back it up with code
I used a similar setup until a few weeks ago, but coding agents became good enough recently.
I don’t find context management and copy pasting fun, I will let GitHub Copilot Insiders or Claude Code do it. I’m still very much in the loop while doing vibe coding.
Of course it depends on the code base, and Redis may not benefit much from coding agents.
But I don’t think one should reject vibe coding at this stage, it can be useful when you know what the LLMs are doing.
> Coding activities should be performed mostly with:
> * Gemini 2.5 PRO
> * Claude Opus 4
I think trying out all the LLMs for each task is highly underappreciated. There is no pareto optimal LLM for all skills. I give each prompt to 8 different LLMs using a Mac app. In my experience while Gemini is consistently in top 3 of 8, the difference between best output and Gemini Pro could be huge.
Interesting. This is quite contrary to my experience. Using LLMs for things ouside my expertise produces crappy results which I can only identify as such months later when my expertise expands. Meanwhile delegating the boring parts that I know too well to agents proved to be a huge productivity boost.
Since I’ve heard Gemini-cli is not yet up to snuff, has anyone tried opencode+gemini? I’ve heard that with opencode you can login with Google account (have NOT confirmed this, but if anyone has any experience, pls advise) so not sure if that would get extra mileage from Gemini’s limits vs using a Gemini api key?
Opus 4 just showed me Claude Code style work evasion heuristics for the first time today. I had been cautiously optimistic that they were just going to run the premium product at the exhorbidant price: you don't always want to pay it, but its there.
I use Claude Code with Opus, and article recommends Gemini 2.5 Pro. I want to try it as well, but I don't know a tool which would make experience compatible to Claude Code. Would it make sense to use with Cursor? Do they try to limit context?
One way to utilize these CLI coding agents that I like is to have them run static analysis tools in a loop, along with whatever test suite you have set up, systematically improving crusty code beyond the fixes that the static analysis tools offer.
I like how this is written in a way that an LLM doing planning can probably infer what to do. Let me know if I hit the nail on the head with what you’re thinking @antirez
Coding with LLMs in the summer of 2025 – an update
(antirez.com)595 points by antirez 20 July 2025 | 412 comments
Comments
Whether it's vibe coding, agentic coding, or copy pasting from the web interface to your editor, it's still sad to see the normalization of private (i.e., paid) LLM models. I like the progress that LLMs introduce and I see them as a powerful tool, but I cannot understand how programmers (whether complete nobodies or popular figures) dont mind adding a strong dependency on a third party in order to keep programming. Programming used to be (and still is, to a large extent) an activity that can be done with open and free tools. I am afraid that in a few years, that will no longer be possible (as in most programmers will be so tied to a paid LLM, that not using them would be like not using an IDE or vim nowadays), since everyone is using private LLMs. The excuse "but you earn six figures, what' $200/month to you?" doesn't really capture the issue here.
This phrasing can be misleading and points to a broader misunderstanding about the nature of doctoral studies, which it has been influenced by the marketing and hype discourse surrounding AI labs.
The assertion that there is a defined "PhD-level knowledge" is pretty useless. The primary purpose of a PhD is not simply to acquire a vast amount of pre-existing knowledge, but rather to learn how to conduct research.
IMHO those two variables are 10x (maybe 100x) more explanatory than any vibe coding setup one can concoct.
Anyone who is befuddled by how the other person {loves, hates} using LLMs to code should ask what kind of problem they are working on and then try to tackle the same problem with AI to get a better sense for their perspective.
Until then, every one of these threads will have dozens of messages saying variations of "you're just not using it right" and "I tried and it sucks", which at this point are just noise, not signal.
When the change is very small, self-contained feature/refactor it can mostly work alone, if you have tests that cover the feature then it is relatively safe (and you can do other stuff because it is running in an action, which is a big plus...write the issue and you are done, sometimes I have had Claude write the issue too).
When it gets to a more medium size, it will often produce something that will appear to work but actually doesn't. Maybe I don't have test coverage and it is my fault but it will do this the majority of the time. I have tried writing the issue myself, adding more info to claude.md, letting claude write the issue so it is a language it understands but nothing works, and it is quite frustrating because you spend time on the review and then see something wrong.
And anything bigger, unsurprisingly, it doesn't do well.
PR reviews are good for small/medium tasks too. Bar is lower here though, much is useless but it does catch things I have missed.
So, imo, still quite a way from being able to do things independently. For small tasks, I just get Claude to write the issue, and wait for the PR...that is great. For medium (which is most tasks), I don't need to do much actual coding, just directing Claude...but that means my productivity is still way up.
I did try Gemini but I found that when you let it off the leash and accept all edits, it would go wild. We have Copilot at work reviewing PRs, and it isn't so great. Maybe Gemini better on large codebases where, I assume, Claude will struggle.
Glad to see my experience is reflected elsewhere. I've found Gemini 2.5 PRO to be the best bang-for-buck model: good reasoning and really cheap to run (counts as 1 request in cursor, where opus can blow my quotas out of the water). Code style works well for me too, its "basic" but thats what I want. If I have only one model to take to my deserted island this is the one I'd use right now.
For the heady stuff, I usually use o3 (but only to debug, its coding style is a bit weird for me), saving Opus 4 for when I need the "big guns".
I don't have Claude Code (cursor user for now), but if I did I'd probably use Opus more.
Frontier LLMs are easiest to work with for now. Open models _will_ catch up. We can be excited for that future.
You are able to learn things from LLMs, and you can ask them for recommendations for an approach to implement something. Or just tell your LLM the approach to take. Sometimes it overcomplicates things. You’ll develop an instinct for when that’s likely. You can head off the overcomplication ahead of time or ask for refactorings after the initial cut is built for you. After a while you get an instinct for which way will get the work done soonest. Most fascinating of all, it’ll all change again with the next round of frontier models.
You don’t need frontier models for every task. For instance I’ve been positively surprised by Github Copilot for straightforward features and fixes. When it’s obvious how to implement, and you won’t need to go back and forth to nail down finer design details, getting an initial PR from Copilot is a great starting place.
To everyone starting out, enjoy the ride, know that none of us know what we’re doing, and share what you learn along the way!
I find it very sad that people who have been really productive without "AI" now go out of their way to find small anecdotal evidence for "AI".
I am on board to agree that pure LLM + pure original full code as context is the best path at the moment, but I’d love to be able to use some shortcuts like quickly applying changes, checkpoints, etc.
My persistent (and not unfounded?) worry is that all the major tools & plugins (Cursor, Cline/Roo) all play games with their own sub-prompts and context “efficiency”.
What’s the purest solution?
Is author suggesting manually pasting redis C files into Gemini Pro chat window on the web?
I agree with this, but this is why I use a CLI. You can pipe files instead of copying and pasting.
Is there any good tooling for making this part easier and less error prone, short of going to a full-fledged agent system?
"Don’t use agents or things like editor with integrated coding agents."
He argues that the copy/paste back and forth with the web UI is essential for maintaining control and providing the correct context.
> more/better in the same time used in the past — which is what I do), when left alone with nontrivial goals they tend to produce fragile code bases that are larger than needed, complex, full of local minima choices, suboptimal in many ways.
And this part felt like a "bitter lesson" anti-pattern:
> Avoid any RAG that will show only part of the code / context to the LLM. This destroys LLMs performance. You must be in control of what the LLM can see when providing a reply.
Ultimately I think cli agents like claude-code and gemini-cli and aider will be controlling the context dynamically, and the human should not be spending premature optimization time on this activity.
If anyone's interested I've got some very exact stats on prompts and accepted solution linked in my LLM proof of concept repo: https://github.com/sutt/agro/blob/master/docs/dev-summary-v1...
I thought large contexts are not necessarily better and sometimes have opposite effect ?
That said, are there tools that make going through a codebase easier for LLMs? I guess tools like Claude Code simply grep through the codebase and find out what Claude needs. Is that good enough or are there tools which keep a much more thorough view of the codebase?
I've been going down to sonnet for coding over opus. maybe i am just writing dumb code
Something that was meant for humans, has now been consumed by AI and he is being repaid for that openness in a way. It comes full circle. Consistency, clarity and openness win again.
antirez is a big fuggin deal on HN.
I’m sort of curious if the AI doubting set will show up in force or not.
One example, I had a PR up that was being reviewed by a colleague. I was driving home from vacation when I saw the 3-4 comments come in. I read them when we stopped for gas, went to OpenAI / codex on my phone, dictated what I needed and made it PR to my branch. Then got back on the road & PR'd it. My colleague saw the PR, agreed and merged it in.
I think of it as having a ton of interns, the AI is about the same quality. It can help to have them, but they often get stuck, need guidance, etc. If you treat the AI like an intern and explain what you need it can often produce good results; just be prepared to fallback to coding quickly.
It’s a far cry away from “vibe code everything”, “this will eliminate jobs” that the current hype train is pushing, despite clearly using the agentic approach with large context provieee by Opus.
But just because I’ve not been lazy…
I think this is key here. Whoever has the best UX for this (right now, it's Cursor IMO) will get the bulk of the market share. But the switching costs are so low for this set of tooling that we'll see a rapid improvement in the products available, and possibly some new entrants.
I used a similar setup until a few weeks ago, but coding agents became good enough recently.
I don’t find context management and copy pasting fun, I will let GitHub Copilot Insiders or Claude Code do it. I’m still very much in the loop while doing vibe coding.
Of course it depends on the code base, and Redis may not benefit much from coding agents.
But I don’t think one should reject vibe coding at this stage, it can be useful when you know what the LLMs are doing.
> * Gemini 2.5 PRO > * Claude Opus 4
I think trying out all the LLMs for each task is highly underappreciated. There is no pareto optimal LLM for all skills. I give each prompt to 8 different LLMs using a Mac app. In my experience while Gemini is consistently in top 3 of 8, the difference between best output and Gemini Pro could be huge.
Untrustworthy is worse than useless.
...one day.