My experience with using AI tools for code review is that they do find critical bugs (from my retrospective analysis, maybe 80% of the time), but the signal to noise ratio is poor. It's really hard to get it not to tell you 20 highly speculative reasons why the code is problematic along with the one critical error. And in almost all cases, sufficient human attention would also have identified the critical bug - so human attention is the primary bottleneck here. Thus poor signal to noise ratio isn't a side issue, it's one of the core issues.
As a result, I'm mostly using this selectively so far, and I wouldn't want it turned on by default for every PR.
None of these tools perform particularly well and all lack context to actually provide a meaningful review beyond what a linter would find, IMO. The SOTA isn't capable of using a code diff as a jumping off point.
Also the system prompts for some of them are kinda funny in a hopelessly naive aspirational way. We should all aspire to live and breathe the code review system prompt on a daily basis.
I've tried Greptile and it's pretty much pure noise. I ran it for 3 PRs and then gave up. Here are three examples of things it wasted my time on in those 3 PRs:
* Suggested to silence exception instead of crash and burn for "style" (the potential exception was handled earlier in code but it did not manage to catch that context). When I commented that silencing the exception could lead to uncaught bugs it replies "You're absolutely right, remove the try-catch" which I of course never added
* Us using python 3.14 is a logic error as "python 3.14 does not exist yet"
* "Review the async/await patterns
Heavy use of async in model validation might indicate these should be application services instead." whatever this vague sentence means. Not sure if it is suggesting us changing the design pattern used in our entire code base.
Also the "confidence" score added to each PR being 4/5 or something due to these irrelevant comments was a really annoying feature IMO. In general AI tools giving a rating when they're wrong feels like a big productivity loss as then the human reviewer will see that number and think something is wrong with the PR.
--
Before this we were running Coderabbit which worked really well and caught a lot of bugs / implementation gotchas. It also had "learnings" which it referenced frequently so it seems like it actually did not repeat commenting on intentional things in our code base. With Coderabbit I found myself wanting to read the low confidence comments as well since they were often useful (so too quiet instead of too noisy). Unfortunately our entire Coderabbit integration just stopped working one day and since then we've been in a long back and forth with their support.
--
I'm not sure what the secret sauce is but it feels like Greptile was GPT 3.5-tier and Coderabbit was Sonnet 4.5-tier.
Problem with Code Review is it is quite straightforward to just prompt it, and the frontier models, whether Opus or GPT5.2Codex do a great job at code-reviews. I don't need second subscription or API call when the first one i already have and focus on integration works well out of the box.
In our case, agentastic.dev, we just baked the code-review right into our IDE. It just packages the diff for the agent, with some prompt, and sends it out to different agent choice (whether claude, codex) in parallel. The reason our users like it so much is because they don't need to pay extra for code-review anymore. Hard to beat free add-on, and cherry on top is you don't need to read a freaking poems.
Greptile is a great product and I hope you succeed.
However, I disagree that independence is a competitive advantage. If it’s true that having a “firewall” between the coding agent and review agent leads to better code, I don’t see why a company like Cursor can’t create full independence between their coding and review products but still bundle them together for distribution.
Furthermore, there might well be benefits to not being fully independent. Imagine if an external auditor was brought in to review every decision made inside your company. There would likely be many things they simply don’t understand. Many decisions in code might seem irrational to an external standalone entity but make sense in the broader context of the organization’s goals. In this sense, I’m concerned that fully independent code review might miss the forest for the trees relative to a bundled product.
Again, I’m rooting for you guys. But I think this is food for thought.
I still think any business that is based on someone else's model is worthless. I know I'm sounding like the 'dropbox is just FTP' guy, but it really feels like that any good idea will just be copied by OpenAI and Anthropic. If AI code review is proven a good idea is there any reason to expect Codex or Claude Code to not implement some commands to do code review?
I've also noticed this explosion of code review tools and felt that there's some misplaced focus going on for companies.
Two that stood out to me are Sentry and Vercel. Both have released code review tools recently and both feel misplaced. I can definitely see why they thought they could expand with that type of product offering but I just don't see a benefit over their competition. We have GH copilot natively available on all our PRs, it does a great job, integrates very well with the PR comment system, and is cheap (free with our current usage patterns). GH and other source control services are well placed to have first-class code review functionality baked into their PR tooling.
It's not really clear to me what Sentry/Vercel are offering beyond what copilot does and in my brief testing of them didn't see noticeable difference in quality or DX. Feels like they're fighting an uphill battle from day one with the product choice and are ultimately limited on DX by how deeply GH and other source control service allow them to integrate.
What I would love to see from Vercel, which they feel very well placed to offer, is AI powered QA. They already control the preview environments being deployed to for each PR, they have a feedback system in place with their Vercel toolbar comments, so they "just" need to tie those together with an agentic QA system. A much loftier goal of course but a differentiator and something I'm sure a lot of teams would pay top dollar for if it works well.
The main problem with current AI reviewers isn't catching bugs, it's shutting up when there is no bug. Humans have an intuitive filter like "this code is weird, but it works and won't break prod, so I'll let it slide". LLMs lack this, they generate 20 comments about variable naming and 1 comment about a critical race condition. As a result the developer gets fatigue and ignores everything. Until AI learns to understand the context of importance, not just code context, it will remain an expensive linter
>A human rubber-stamping code being validated by a super intelligent machine is the equivalent of a human sitting silently in the driver's seat of a self-driving car, "supervising".
So, absolutely necessary and essential?
In order to get the machine out of trouble when the unavoidable strange situation happens that didn't appear during training, and requires some judgement based on ethics or logical reasoning. For that case, you need a human in charge.
What should be added, I think, to code reviewing is that it can get really complex, for example if we add formal verification in the mix to catch very subtle bugs.
So in the end I think there will still be some disappointment, as one would expect it should be fully automated and only about reading the code, like this article suggests. In reality, I think it is harder than writing code.
Fuzzy automated reviews should always run in an interactive loop with a developer on their workstation and contain enough context to quickly assess if they are valid or not.
When developers create a PR, they already feel they are "done", and they have likely already shifted their focus on another task. False positive are horrible at this point, especially when they keep changing with each push of commits.
We used Greptile where I work and it was so bad we decided to switch to Claude. And even Claude isn’t nearly as good at reviewing as an experienced programmer with domain knowledge.
> This might seem far-fetched but the counterfactual is Kafkaesque.
> As the proprietors of an, er, AI code review tool suddenly beset by an avalanche of competition, we're asking ourselves: what makes us different?
> Human engineers should be focused only on two things - coming up with brilliant ideas for what should exist, and expressing their vision and taste to agents that do the cruft of turning it all into clean, performant code.
> If there is ambiguity at any point, the agents Slack the human to clarify.
Was this LLM advertisement generated by an LLM? Feels so at least.
This article has a catchy headline, but there's really no content to it. This is content marketing without content. It seems like every week on Hacker News, there's a dozen of these. All seemingly code reviewers, too. Keep it to LinkedIn.
Contrary to some of the other anecdotes in this thread, I've found automated code review to discover some tricky stuff that humans missed. We use https://www.cubic.dev/
Good code reviews are part of team's culture and it's hard to just patch it with an agent. With millions of tools it will be arms race between which one is louder about as many things as possible because:
- it will have higher chance at convincing the author that the issue was important by throwing more darts - something that a human wouldn't do because it takes real mental effort to go through an authentic review,
- it will sometimes find real big issue which reinforces the bias that it's useful
- there will always be tendency towards more feedback (not higher quality) because if it's too silent, is it even doing anything?
So I believe it will just add more round of back and forth of prompting between more people, but not sure if net positive
Plus PRs are a good reality check if your code makes sense, when another person reviews it. A final safeguard before maintainability miss, or a disaster waiting to be deployed.
If you give LLM a hammer everything looks like a nail, you give it a saw everything looks like wood. You ask LLM to find issues, it will find "issues" At the end of the day, you will have to fix those issues, if you decide to have another LLM fix those issues, by the time you are done with that cycle, you are going to end up with code that will be thoroughly over engineered.
> In addition, success is generally pretty well-defined. Everyone wants correct, performant, bug-free, secure code.
I feel like these are often not well defined? "Its not a bug it's a feature", "premature optimization is the root of all evil", etc
In different contexts, "performant enough" means different things. Similarly, many times I've seen different teams within a company have differing opinions on "correctness"
I liked that the post is self-aware that it's promoting its own product. But the writing seemed more focus on the philosophy behind code reviews and the impact of AI, and less on the mechanics of how greptile differs from competitors. I was hoping to see more on the latter.
My company just finished a several week review period of Greptile. Devs were split over the usefulness of the tool (compared to our current solution, Cursor). While Greptile did occasionally offer better insights than Cursor, it also exhibited strange behavior such as entirely overwriting PR descriptions with its own text and occasionally arguing with itself in the comments. In the end we decided to NOT purchase Greptile as there were enough "not quite there" issues that made it more trouble than worthwhile. I am certain, though, that the Greptile team will resolve all those problems and I wish them the best of luck!
Either become a platform or get swallowed up by one (e.g. Cursor acquiring Graphite to become more of a platform). Trying to prove out that your code review agent is marginally better than others when the capability is being included in every single solution is a losing strategy. They can just give the capability away for free. Also, the idea that code review will scale dramatically in importance as more code is written by agents is not new.
This article surprised me. I would have expected it would be about how _human_ code review is unsustainable in the face of AI-enhanced velocity.
I would be interested to hear of some specific use-cases for LLMs in code review.
With static analysis, tests, and formatters I thought code review was mostly interpersonal at this point. Mentorship, ensuring a chain of liability in approvals, negotiating comfort levels among peers with the shared responsibility of maintaining the code, that kind of thing.
1. I absolutely agree there's a bubble. Everybody is shipping a code review agent.
2. What on earth is this defense of their product? I could see so many arguments for why their code reviewer is the best, and this contains none of them.
More broadly, though, if you've gotten to the point where you're relying on AI code review to catch bugs, you've lost the plot.
The point of a PR is to share knowledge and to catch structural gaps. Bug-finding is a bonus. Catching bugs, automated self-review, structuring your code to be sensible: that's _your_ job. Write the code to be as sensible as possible, either by yourself or with an AI. Get the review because you work on a team, not in a vacuum.
It's not terribly hard to write a Copilot GHA that does this yourself for your specific teams needs. Not sure why you'd been to bring a vendor on for this....
What do the vendors provide?
I looked at a couple which were pretty snazzy at first glance, but now that I know more about how copilot agents work and such, I'm pretty sure in a few hours, I could have the foundation for my team to build on that would take care of a lot of our PR review needs....
I’ve found only one good code review bot, and that’s Unblocked. It doesn’t always leave a comment, and when it does, it’s often found 1-2 real bugs in the code crossing multiple files (even like “hey you forgot to update this reference in this other file not edited in the PR”). Things you’d expect someone with a deeper knowledge of the code to know.
You do get a handful of false positives, especially if what it reports is technically correct, but we’re just handling the issue in a sort of weird/undocumented way. But it’s only one one comment that’s easy to dismiss, and it’s fairly rare. It’s not like huge amounts of AI vomit all over PRs. It’s a lot more focused.
> Only once would you have X write a PR, then have X approve and merge it to realize the absurdity of what you just did.
I get the idea. I'll still throw out that having a single X go through the full workflow could still be useful in that there's an audit log, undo features (reverting a PR), notifications what have you. It's not equivalent to "human writes ticket, code deployed live" for that reason
I find a lot of times with co-pilot it calls out issues where if the AI had more context of the whole codebase it would realize that scenario can’t actually occur.
Or it won’t understand some invariant that you know but is not explicit anywhere
Maybe I'm buying into the cool-aid, but I actually really liked the self-aware tone of this post.
> Based on our benchmarks, we are uniquely good at catching bugs. However, if all company blogs are to be trusted, this is something we have in common with every other AI code review product. One just has to try a few, and pick the one that feels the best.
I had a bad experience with greptile due to what seemed to be excessive noise and nit comments. I have been using cursorbot for a year and really like it.
Why not let AI write the code and then have it reviewed by humans? If you use AI to review my code, then you can't stop me from using another AI to refute it: this only foreshadows the beginning of internal friction.
where we draw the line on agent "identity" when the models being orchestrated are generally the same 3 frontier intelligences is an interesting question indeed
I would think this idea of creating a third-party to verify things likely centers more around liability/safety cover for a steroidal increase in velocity (i.e. --dangerously-skip-permissions) rather than anything particularly pragmatic or technical (but still poised to capture a ton of value)
LLMs writing code, and then LLMs reviewing the code. And when customers run into a problem with the buggy slop you just churned out, they can talk to a LLM chat bot. Isn't it just swell?
I would suggest you check out your Greptile discord and/or answer your messages on X where people are trying to reach you with problems and questions about your service. Unless that no longer matters.
"While some other products have built out great UIs for humans to review code in an AI-assisted paradigm, we have chosen to build for what we consider to be an inevitable future - one where code validation requires vanishingly little human participation."
Ok good, now I know not to bother reading through any of their marketing literature, because while the product at first interested me, now I know it's exactly not what I want for my team.
The actual "bubble" we have right now is a situation where people can produce and publish code they don't understand, and where engineers working on a system no longer are forced to reckon with and learn the intricacies of their system, and even senior engineers don't gain literacy into the very thing they're working on, and so are somewhat powerless to assess quality and deal with crisis when it hits.
The agentic coding tools and review tools I want my team (and myself) to have access to are ones that ones that force an explicit knowledge interview & acquisition process during authoring and involve the engineer more intricately in the whole flow.
What we got instead with claude code & friends is a thing way too eager to take over the whole thing. And while it can produce some good results it doesn't produce understandable systems.
To be clear, it's been a long time since writing code has been the hard part of the job? in many many domains. The hard part is systems & architecture and while these tools can help with that, there's nothing more potentially terrifying pthan a team full of people who have agentically produced a codebase that they cannot holistically understand the nuances of.
So, yeah, I want review tools for that scenario. Since these people have marketed themselves off the table... what is out there?
We spend a ton of time looking at the code and blocking merges, and the end result is still full of bugs. AI code review only provides a minor improvement. The only reason we do code review at all is humans don't trust that the code works. Know another way to tell if code works? Running it. If our code is so utterly inconceivable that we can't make tests that can accurately assess if the code works, then either our code design is too complicated, or our tests suck.
OTOH, if the reason you're doing code review is to ensure the code "is beautiful" or "is maintainable", again, this is a human concern; the AI doesn't care. In fact, it's becoming apparent that it's easier to replace entire sections of code with new AI generated code than to edit it.
My experience with code review tools has been dreadful. In most cases I can remember the reviews are inaccurate, "you are absolutely right" sycophantic garbage, or missing the big picture. The worst feature of all is the "PR summary" which is usually pure slop lacking the context around why a PR was made. Thankfully that can be turned off.
I have to be fair and say that yes, occasionally, some bug slips past the humans and is caught by the robot. But these bugs are usually also caught by automated unit/integration tests or by linters. All in all, you have to balance the occasional bug with all the time lost "reviewing the code review" to make sure the robot didn't just hallucinate something.
Haven’t used a single one that was any good. Basically a 50/50 crapshoot if what they are saying makes any sense at all, let alone it being considered “good” comments. Basically no different than random chance.
Reminder that this comes from from the founder that got rightly lambasted for his comments about work life balance and then doubled down when called out.
No shit. What is the point of using an llm model to review code produced by an llm model?
Code review pressupose a different perspective, which no platform can offer at the moment because they are just as sophisticated as the model they wrap.
Claude generated the code, and Claude was asked if the code was good enough, and now you want to be in the middle to ask Claude again but with more emphasis, I guess?
If I want more emphasis I can ask Claude myself. Or Qwen.
I can't even begin to understand this rationale.
There is an AI code review bubble
(greptile.com)303 points by dakshgupta 23 hours ago | 208 comments
Comments
As a result, I'm mostly using this selectively so far, and I wouldn't want it turned on by default for every PR.
Also the system prompts for some of them are kinda funny in a hopelessly naive aspirational way. We should all aspire to live and breathe the code review system prompt on a daily basis.
* Suggested to silence exception instead of crash and burn for "style" (the potential exception was handled earlier in code but it did not manage to catch that context). When I commented that silencing the exception could lead to uncaught bugs it replies "You're absolutely right, remove the try-catch" which I of course never added * Us using python 3.14 is a logic error as "python 3.14 does not exist yet" * "Review the async/await patterns Heavy use of async in model validation might indicate these should be application services instead." whatever this vague sentence means. Not sure if it is suggesting us changing the design pattern used in our entire code base.
Also the "confidence" score added to each PR being 4/5 or something due to these irrelevant comments was a really annoying feature IMO. In general AI tools giving a rating when they're wrong feels like a big productivity loss as then the human reviewer will see that number and think something is wrong with the PR.
--
Before this we were running Coderabbit which worked really well and caught a lot of bugs / implementation gotchas. It also had "learnings" which it referenced frequently so it seems like it actually did not repeat commenting on intentional things in our code base. With Coderabbit I found myself wanting to read the low confidence comments as well since they were often useful (so too quiet instead of too noisy). Unfortunately our entire Coderabbit integration just stopped working one day and since then we've been in a long back and forth with their support.
--
I'm not sure what the secret sauce is but it feels like Greptile was GPT 3.5-tier and Coderabbit was Sonnet 4.5-tier.
In our case, agentastic.dev, we just baked the code-review right into our IDE. It just packages the diff for the agent, with some prompt, and sends it out to different agent choice (whether claude, codex) in parallel. The reason our users like it so much is because they don't need to pay extra for code-review anymore. Hard to beat free add-on, and cherry on top is you don't need to read a freaking poems.
However, I disagree that independence is a competitive advantage. If it’s true that having a “firewall” between the coding agent and review agent leads to better code, I don’t see why a company like Cursor can’t create full independence between their coding and review products but still bundle them together for distribution.
Furthermore, there might well be benefits to not being fully independent. Imagine if an external auditor was brought in to review every decision made inside your company. There would likely be many things they simply don’t understand. Many decisions in code might seem irrational to an external standalone entity but make sense in the broader context of the organization’s goals. In this sense, I’m concerned that fully independent code review might miss the forest for the trees relative to a bundled product.
Again, I’m rooting for you guys. But I think this is food for thought.
Two that stood out to me are Sentry and Vercel. Both have released code review tools recently and both feel misplaced. I can definitely see why they thought they could expand with that type of product offering but I just don't see a benefit over their competition. We have GH copilot natively available on all our PRs, it does a great job, integrates very well with the PR comment system, and is cheap (free with our current usage patterns). GH and other source control services are well placed to have first-class code review functionality baked into their PR tooling.
It's not really clear to me what Sentry/Vercel are offering beyond what copilot does and in my brief testing of them didn't see noticeable difference in quality or DX. Feels like they're fighting an uphill battle from day one with the product choice and are ultimately limited on DX by how deeply GH and other source control service allow them to integrate.
What I would love to see from Vercel, which they feel very well placed to offer, is AI powered QA. They already control the preview environments being deployed to for each PR, they have a feedback system in place with their Vercel toolbar comments, so they "just" need to tie those together with an agentic QA system. A much loftier goal of course but a differentiator and something I'm sure a lot of teams would pay top dollar for if it works well.
So, absolutely necessary and essential?
In order to get the machine out of trouble when the unavoidable strange situation happens that didn't appear during training, and requires some judgement based on ethics or logical reasoning. For that case, you need a human in charge.
> Independence
Any "agent" running against code review instead of code generation is "independent"?
> Autonomy
Most other code review tools can also be automated and integrated.
> Loops
You can also ping other code review tools for more reviews...
I feel like this article actually works against you by presenting the problem and inadequately solving them.
> Today's agents are better than the median human code reviewer
Which is it? You cannot have it both ways.
So in the end I think there will still be some disappointment, as one would expect it should be fully automated and only about reading the code, like this article suggests. In reality, I think it is harder than writing code.
When developers create a PR, they already feel they are "done", and they have likely already shifted their focus on another task. False positive are horrible at this point, especially when they keep changing with each push of commits.
> As the proprietors of an, er, AI code review tool suddenly beset by an avalanche of competition, we're asking ourselves: what makes us different?
> Human engineers should be focused only on two things - coming up with brilliant ideas for what should exist, and expressing their vision and taste to agents that do the cruft of turning it all into clean, performant code.
> If there is ambiguity at any point, the agents Slack the human to clarify.
Was this LLM advertisement generated by an LLM? Feels so at least.
A code review requires reasoning and understanding, things that to my knowledge a generative model cannot do.
Surely the most an AI code review ever could be is something that looks like a code review.
- it will have higher chance at convincing the author that the issue was important by throwing more darts - something that a human wouldn't do because it takes real mental effort to go through an authentic review,
- it will sometimes find real big issue which reinforces the bias that it's useful
- there will always be tendency towards more feedback (not higher quality) because if it's too silent, is it even doing anything?
So I believe it will just add more round of back and forth of prompting between more people, but not sure if net positive
Plus PRs are a good reality check if your code makes sense, when another person reviews it. A final safeguard before maintainability miss, or a disaster waiting to be deployed.
I feel like these are often not well defined? "Its not a bug it's a feature", "premature optimization is the root of all evil", etc
In different contexts, "performant enough" means different things. Similarly, many times I've seen different teams within a company have differing opinions on "correctness"
I would be interested to hear of some specific use-cases for LLMs in code review.
With static analysis, tests, and formatters I thought code review was mostly interpersonal at this point. Mentorship, ensuring a chain of liability in approvals, negotiating comfort levels among peers with the shared responsibility of maintaining the code, that kind of thing.
2. What on earth is this defense of their product? I could see so many arguments for why their code reviewer is the best, and this contains none of them.
More broadly, though, if you've gotten to the point where you're relying on AI code review to catch bugs, you've lost the plot.
The point of a PR is to share knowledge and to catch structural gaps. Bug-finding is a bonus. Catching bugs, automated self-review, structuring your code to be sensible: that's _your_ job. Write the code to be as sensible as possible, either by yourself or with an AI. Get the review because you work on a team, not in a vacuum.
What do the vendors provide?
I looked at a couple which were pretty snazzy at first glance, but now that I know more about how copilot agents work and such, I'm pretty sure in a few hours, I could have the foundation for my team to build on that would take care of a lot of our PR review needs....
You do get a handful of false positives, especially if what it reports is technically correct, but we’re just handling the issue in a sort of weird/undocumented way. But it’s only one one comment that’s easy to dismiss, and it’s fairly rare. It’s not like huge amounts of AI vomit all over PRs. It’s a lot more focused.
I get the idea. I'll still throw out that having a single X go through the full workflow could still be useful in that there's an audit log, undo features (reverting a PR), notifications what have you. It's not equivalent to "human writes ticket, code deployed live" for that reason
Or it won’t understand some invariant that you know but is not explicit anywhere
> Based on our benchmarks, we are uniquely good at catching bugs. However, if all company blogs are to be trusted, this is something we have in common with every other AI code review product. One just has to try a few, and pick the one that feels the best.
I would think this idea of creating a third-party to verify things likely centers more around liability/safety cover for a steroidal increase in velocity (i.e. --dangerously-skip-permissions) rather than anything particularly pragmatic or technical (but still poised to capture a ton of value)
still need HITL, but the human is shifted right and can do other things rather than grinding through fiddly details.
Ok good, now I know not to bother reading through any of their marketing literature, because while the product at first interested me, now I know it's exactly not what I want for my team.
The actual "bubble" we have right now is a situation where people can produce and publish code they don't understand, and where engineers working on a system no longer are forced to reckon with and learn the intricacies of their system, and even senior engineers don't gain literacy into the very thing they're working on, and so are somewhat powerless to assess quality and deal with crisis when it hits.
The agentic coding tools and review tools I want my team (and myself) to have access to are ones that ones that force an explicit knowledge interview & acquisition process during authoring and involve the engineer more intricately in the whole flow.
What we got instead with claude code & friends is a thing way too eager to take over the whole thing. And while it can produce some good results it doesn't produce understandable systems.
To be clear, it's been a long time since writing code has been the hard part of the job? in many many domains. The hard part is systems & architecture and while these tools can help with that, there's nothing more potentially terrifying pthan a team full of people who have agentically produced a codebase that they cannot holistically understand the nuances of.
So, yeah, I want review tools for that scenario. Since these people have marketed themselves off the table... what is out there?
We spend a ton of time looking at the code and blocking merges, and the end result is still full of bugs. AI code review only provides a minor improvement. The only reason we do code review at all is humans don't trust that the code works. Know another way to tell if code works? Running it. If our code is so utterly inconceivable that we can't make tests that can accurately assess if the code works, then either our code design is too complicated, or our tests suck.
OTOH, if the reason you're doing code review is to ensure the code "is beautiful" or "is maintainable", again, this is a human concern; the AI doesn't care. In fact, it's becoming apparent that it's easier to replace entire sections of code with new AI generated code than to edit it.
Not my experience
> A human rubber-stamping code being validated by a super intelligent machine
What? I dunno how they define intelligence, but LLMS are absolutely not super intelligent.
> If agents are approving code, it would be quite absurd and perhaps non-compliant to have the agent that wrote the code also approve the code.
It's all the same frontier models under the hood. Who are you kidding.
I have to be fair and say that yes, occasionally, some bug slips past the humans and is caught by the robot. But these bugs are usually also caught by automated unit/integration tests or by linters. All in all, you have to balance the occasional bug with all the time lost "reviewing the code review" to make sure the robot didn't just hallucinate something.
since they're likely telling you things you know if you test and write your own code.
oh - writing your own code is a thing of the past - a.i writes, a.i then finds bugs
Can drop the extra words
Code review pressupose a different perspective, which no platform can offer at the moment because they are just as sophisticated as the model they wrap. Claude generated the code, and Claude was asked if the code was good enough, and now you want to be in the middle to ask Claude again but with more emphasis, I guess? If I want more emphasis I can ask Claude myself. Or Qwen. I can't even begin to understand this rationale.