Our only modification part is that, if the Software (or any derivative works
thereof) is used for any of your commercial products or services that have
more than 100 million monthly active users, or more than 20 million US dollars
(or equivalent in other currencies) in monthly revenue, you shall prominently
display "Kimi K2.5" on the user interface of such product or service.
Coincidence or not, let's just marvel for a second over this amount of magic/technology that's being given away for free... and how liberating and different this is than OpenAI and others that were closed to "protect us all".
One thing caught my eyes is that besides K2.5 model, Moonshot AI also launched Kimi Code (https://www.kimi.com/code), evolved from Kimi CLI. It is a terminal coding agent, I've been used it last month with Kimi subscription, it is capable agent with stable harness.
Have you all noted that the latest releases (Qwen3 max thinking, now Kimi k2.5) from Chinese companies are benching against Claude opus now and not Sonnet? They are truly catching up, almost at the same pace?
I've read several people say that Kimi K2 has a better "emotional intelligence" than other models. I'll be interested to see whether K2.5 continues or even improves on that.
The directionally interesting part is that according to the announcement, K2.5 seems to be trained specifically to create sub-agents and work in an agent swarm usefully. The key part is that you don't need to manually create or prompt sub-agents, K2.5 creates them automatically, so from the looks of things it's similar to Claude Code dynamic sub-agents except the model is trained to scale to many more agents autonomously.
I wonder whether Claude is doing the same kind of training and it's coming with the next model, and that's why the agent swarm mode in Claude Code is hidden for now. We might be getting very very good agent orchestrators/swarms very soon.
K2 0905 and K2 Thinking shortly after that have done impressively well in my personal use cases and was severely slept on. Faster, more accurate, less expensive, more flexible in terms of hosting and available months before Gemini 3 Flash, I really struggle to understand why Flash got such positive attention at launch.
Interested in the dedicated Agent and Agent Swarm releases, especially in how that could affect third party hosting of the models.
Why is that Claude still at the top in coding, are they heavily focused on training for coding or is it their general training is so good that it performs well in coding?
Someone please beat the Opus 4.5 in coding, I want to replace it.
A realistic setup for this would be a 16× H100 80GB with NVLink. That comfortably handles the active 32B experts plus KV cache without extreme quantization. Cost-wise we are looking at roughly $500k–$700k upfront or $40–60/hr on-demand, which makes it clear this model is aimed at serious infra teams, not casual single-GPU deployments. I’m curious how API providers will price tokens on top of that hardware reality.
As your local vision nut, their claims about "SOTA" vision are absolutely BS in my tests.
Sure it's SOTA at standard vision benchmarks. But on tasks that require proper image understanding, see for example BabyVision[0] it appears very much lacking compared to Gemini 3 Pro.
I don't get this "agent swarm" concept. You set up a task and they boot up 100 LLMs to try to do it in parallel, and then one "LLM judge" puts it all together? Is there anywhere I can read more about it?
Can we please stop calling those models "open source"? Yes the weights are open. So, "open weight" maybe. But the source isn't open, the thing that allows to re-create it. That's what "open source" used to mean. (Together with a license that allows you to use that source for various things.)
Is there a startup that takes models like this, and effectively gives you a secure setup, where you have (a) a mobile app that (b) talks to some giant machine that only you have access too.
If a 10K computer could run this, it may be worth it to have a "fully on prem" version of ChatGPT running for you.
I had these weird situations like some models are refusing to use SSH as a tool. Not sure if it was the coding tool limitation or it is baked into in some of the models.
Is this actually good or just optimized heavily for benchmarks? I am hopefully its the former based on the writeup but need to put it through its paces.
Glad to to see open source models are catching up and treat vision as first-class citizen (a.k.a native multimodal agentic model). GLM and Qwen models takes different approach, by having a base model and a vision variant (glm-4.6 vs glm-4.6v).
I guess after Kimi K2.5, other vendors are going to the same route?
Can't wait to see how this model performs on computer automation use cases like
VITA AI Coworker.
Kimi Released Kimi K2.5, Open-Source Visual SOTA-Agentic Model
(kimi.com)502 points by nekofneko 27 January 2026 | 239 comments
Comments
1T parameters, 32b active parameters.
License: MIT with the following modification:
Our only modification part is that, if the Software (or any derivative works thereof) is used for any of your commercial products or services that have more than 100 million monthly active users, or more than 20 million US dollars (or equivalent in other currencies) in monthly revenue, you shall prominently display "Kimi K2.5" on the user interface of such product or service.
Coincidence or not, let's just marvel for a second over this amount of magic/technology that's being given away for free... and how liberating and different this is than OpenAI and others that were closed to "protect us all".
> K2.5 Agent Swarm improves performance on complex tasks through parallel, specialized execution [..] leads to an 80% reduction in end-to-end runtime
Not just RL on tool calling, but RL on agent orchestration, neat!
* https://lmarena.ai/leaderboard — crowd-sourced head-to-head battles between models using ELO
* https://dashboard.safe.ai/ — CAIS' incredible dashboard
* https://clocks.brianmoore.com/ — a visual comparison of how well models can draw a clock. A new clock is drawn every minute
* https://eqbench.com/ — emotional intelligence benchmarks for LLMs
* https://www.ocrarena.ai/battle — OCR battles, ELO
* https://mafia-arena.com/ — LLMs playing the social deduction game Mafia
* https://openrouter.ai/rankings — marketshare based on OpenRouter
GitHub: https://github.com/MoonshotAI/kimi-cli
I wonder whether Claude is doing the same kind of training and it's coming with the next model, and that's why the agent swarm mode in Claude Code is hidden for now. We might be getting very very good agent orchestrators/swarms very soon.
Interested in the dedicated Agent and Agent Swarm releases, especially in how that could affect third party hosting of the models.
Why is that Claude still at the top in coding, are they heavily focused on training for coding or is it their general training is so good that it performs well in coding?
Someone please beat the Opus 4.5 in coding, I want to replace it.
Sure it's SOTA at standard vision benchmarks. But on tasks that require proper image understanding, see for example BabyVision[0] it appears very much lacking compared to Gemini 3 Pro.
[0] https://arxiv.org/html/2601.06521v1
https://big-agi.com/static/kimi-k2.5-less-censored.jpg
Maybe we can get away with something cheaper than Claude for coding.
If a 10K computer could run this, it may be worth it to have a "fully on prem" version of ChatGPT running for you.
I guess after Kimi K2.5, other vendors are going to the same route?
Can't wait to see how this model performs on computer automation use cases like VITA AI Coworker.
https://www.vita-ai.net/
URL is down so cannot tell.