Qwen3-Coder: Agentic coding in the world

(qwenlm.github.io)

Comments

danielhanchen 22 July 2025
I'm currently making 2bit to 8bit GGUFs for local deployment! Will be up in an hour or so at https://huggingface.co/unsloth/Qwen3-Coder-480B-A35B-Instruc...

Also docs on running it in a 24GB GPU + 128 to 256GB of RAM here: https://docs.unsloth.ai/basics/qwen3-coder

pxc 22 July 2025
> Qwen3-Coder is available in multiple sizes, but we’re excited to introduce its most powerful variant first

I'm most excited for the smaller sizes because I'm interested in locally-runnable models that can sometimes write passable code, and I think we're getting close. But since for the foreseeable future, I'll probably sometimes want to "call in" a bigger model that I can't realistically or affordably host on my own computer, I love having the option of high-quality open-weight models for this, and I also like the idea of "paying in" for the smaller open-weight models I play around with by renting access to their larger counterparts.

Congrats to the Qwen team on this release! I'm excited to try it out.

flakiness 22 July 2025
The "qwen-code" app seems to be a gemini-cli fork.

https://github.com/QwenLM/qwen-code https://github.com/QwenLM/qwen-code/blob/main/LICENSE

I hope these OSS CC clones converge at some point.

Actually it is mentioned in the page:

   we’re also open-sourcing a command-line tool for agentic coding: Qwen Code. Forked from Gemini Code
zkmon 23 July 2025
At my work, here is a typical breakdown of time spent by work areas for a software engineer. Which of these areas can be sped up by using agentic coding?

05%: Making code changes

10%: Running build pipelines

20%: Learning about changed process and people via zoom calls, teams chat and emails

15%: Raising incident tickets for issues outside of my control

20%: Submitting forms, attending reviews and chasing approvals

20%: Reaching out to people for dependencies, following up

10%: Finding and reading up some obscure and conflicting internal wiki page, which is likely to be outdated

nisten 23 July 2025
I've been using it all day, it rips. Had to bump up toolcalling limit in cline to 100 and it just went through the app no issues, got the mobile app built, fixed throug hthe linter errors... wasn't even hosting it with the toolcall template on with the vllm nightly, just stock vllm it understood the toolcall instructions just fine
nnx 23 July 2025
This suggests adding a `QWEN.md` in the repo for agents instructions. Where are we with `AGENTS.md`? In a team repo it's getting ridiculous to have a duplicate markdown file for every agent out there.
chisleu 23 July 2025
I tried using the "fp8" model through hyperbolic but I question if it was even that model. It was basically useless through hyperbolic.

I downloaded the 4bit quant to my mac studio 512GB. 7-8 minutes until first tokens with a big Cline prompt for it to chew on. Performance is exceptional. It nailed all the tool calls, loaded my memory bank, and reasoned about a golang code base well enough to write a blog post on the topic: https://convergence.ninja/post/blogs/000016-ForeverFantasyFr...

Writing blog posts is one of the tests I use for these models. It is a very involved process including a Q&A phase, drafting phase, approval, and deployment. The filenames follow a certain pattern. The file has to be uploaded to s3 in a certain location to trigger the deployment. It's a complex custom task that I automated.

Even the 4bit model was capable of this, but was incapable of actually working on my code, prefering to halucinate methods that would be convenient rather than admitting it didn't know what it was doing. This is the 4 bit "lobotomized" model though. I'm excited to see how it performs at full power.

indigodaddy 23 July 2025
How does one keep up with all this change? I wish we could fast-forward like 2-3 years to see if an actual winner has landed by then. I feel like at that point there will be THE tool, with no one thinking twice about using anything else.
jasonthorsness 22 July 2025
What sort of hardware will run Qwen3-Coder-480B-A35B-Instruct?

With the performance apparently comparable to Sonnet some of the heavy Claude Code users could be interested in running it locally. They have instructions for configuring it for use by Claude Code. Huge bills for usage are regularly shared on X, so maybe it could even be economical (like for a team of 6 or something sharing a local instance).

rbren 22 July 2025
Glad to see everyone centering on using OpenHands [1] as the scaffold! Nothing more frustrating than seeing "private scaffold" on a public benchmark report.

[1] https://github.com/All-Hands-AI/OpenHands

rapind 22 July 2025
I just checked and it's up on OpenRouter. (not affiliated) https://openrouter.ai/qwen/qwen3-coder
generalizations 22 July 2025
> Additionally, we are actively exploring whether the Coding Agent can achieve self-improvement

How casually we enter the sci-fi era.

Imanari 23 July 2025
I've been using it within Claude Code via ccr[0] and it feels very similar to Claude 4.

[0] https://github.com/musistudio/claude-code-router

mohsen1 22 July 2025
Open weight models matching Cloud 4 is exciting! It's really possible to run this locally since it's MoE
jddj 22 July 2025
Odd to see this languishing at the bottom of /new. Looks very interesting.

Open, small, if the benchmarks are to be believed sonnet 4~ish, tool use?

veselin 23 July 2025
Anybody knows if one can find an inference provider that offers input token caching? It should be almost required for agentic use - first speed, but also almost all conversations start where the previous ended, so cost may end up quite higher with no caching.

I would have expected good providers like Together, Fireworks, etc support it, but I can't find it, except if I run vllm myself on self-hosted instances.

vFunct 22 July 2025
Much faster than Claude Sonnet 4 with similar results.
_peregrine_ 23 July 2025
Pretty solid at SQL generation, too. Just tested in our generation benchmark: https://llm-benchmark.tinybird.live/

Not quite as good as Claude but by the best Qwen model so far and 2x as fast as qwen3-235b-a22b-07-25

Specific results for qwen3-coder here: https://llm-benchmark.tinybird.live/models/qwen3-coder

pzo 23 July 2025
does anyone understand pricing ? On OpenRouter (https://openrouter.ai/qwen/qwen3-coder) you have:

Alibaba Plus: input: $1 to $6 output: $5 to $60

Alibaba OpenSource: input: $1.50 to $4.50 output: $7.50 to $22.50

So it doesn't look that cheap comparing to Kimi k2 or their non coder version (Qwen3 235B A22B 2507).

What's more confusing this "up to" pricing that supposed to can reach $60 for output - with agents it's not that easy to control context.

karolist 23 July 2025
I have 4x3090 (96GB) and 128GB DDR4 RAM, can I run unsloth on this machine and utilize all 4 GPUs?
Alifatisk 23 July 2025
Wow, these companies in the llm field is so quick to catch up. From everyone offering their own chat model to openai-compitable schema to allowing extensions and IDEs do the work to agentic tasks and now most of them offering their own cli
sunaookami 23 July 2025
Thank god I already made an Alibaba Cloud account last year because this interface sucks big time. At least you get 1 mio. tokens free (once?). Bit confusing that they forked the Gemini CLI but you still have to set environment variables for OpenAI?
zelphirkalt 23 July 2025
So far none of these models can write even a slightly complicated function well for me. I tried Mistral, ChatGPT, Qwen Coder 2, Claude, ... they apparently all fail when the solution requires to make use of continuations and such. Probably, because they don't have enough examples in their training data or something.

Example: Partition a linked list in linear time. None of these models seems to be able to get, that `reverse` or converting the whole list to a vector are in themselves linear operations and therefore forbid themselves. When you tell them to not use those, they still continue to do so and blatantly claim, that they are not using them. Á la:

"You are right, ... . The following code avoids using `reverse`, ... :

[code that still uses reverse]"

And in languages like Python they will cheat, because Python's list is more like an array, where random access is O(1).

This means they only work well, when you are doing something quite mainstream, where the amount of training data is a significantly strong signal in the noise. But even there they often struggle. For example I found them somewhat useful for doing Django things, but just as often they gave bullshit code, or it took a lot of back and forth to get something useful out of them.

I think it is embarrassing, that with sooo much training data, they are still unable to do much more than going by frequency in training data when suggesting "solutions". They are "learning" differently than a human being. When a human being sees a new concept, they can often apply that new concept, even if that concept does not happen to be needed that often, as long as they remember the concept. But in these LLMs it seems they deem everything that isn't mainstream irrelevant.

lvl155 23 July 2025
Can someone please make these CLI from Rust/Ratatui.
jug 23 July 2025
I checked this website along with API pricing on OpenRouter, and this one beats Gemini 2.5 Pro (…Preview-0506 in their chart, but with a good margin so probably the non-preview too) at half Google’s API price. Nice. Admittedly their own posted benchmark, but still. If it even just competes with it, it’s a win.

Edit:

I ran my fun test on it and it unfortunately failed.

> ”How can I detect whether a user is running in a RemoteApp context using C# and .NET? That is, not a full RDP desktop session, but a published RemoteApp as if the app is running locally. The reason I’m asking is that we have an unfortunate bug in a third party library that only shows up in this scenario, and needs a specific workaround when it happens.”

It started by trying to read hallucinated environment variables that just aren’t there. Gemini 2.5 Pro had the same issue and IIRC also Claude.

The only one I have seen give the correct answer that is basically ”You can’t. There’s no official method to do this and this is intentional by Microsoft.” along with a heuristic to instead determine the root launching process which is thus far (but not guaranteed to be) RDPINIT.EXE rather than EXPLORER.EXE as in typical desktop or RDP scenarios… has been OpenAI o3. o3 also provided additional details about the underlying protocol at play here which I could confirm with external sources to be correct.

I like my query because it forces the LLM to actually reply with that you just can’t do this, there’s no ”sign” of it other than going by a completely different side-effect. They are usually too eager to try to figure out a positive reply and hallucinate in the process. Often, there _are_ these env vars to read in cases like these, but not here.

mogili 23 July 2025
I'm waiting on this to be released on Groq or Cerebras for high speed vibe coding.
jijji 23 July 2025
I'm confused why would this LLM require API keys to openAI?
incomingpain 23 July 2025
Now I await the distilled options.

I wonder if there's a python expert that can be isolated.