Kimi Released Kimi K2.5, Open-Source Visual SOTA-Agentic Model Hackernews Viewer

Kimi Released Kimi K2.5, Open-Source Visual SOTA-Agentic Model

502 points by nekofneko 27 January 2026 | 239 comments

Comments

Tepix 27 January 2026

Huggingface Link: https://huggingface.co/moonshotai/Kimi-K2.5

1T parameters, 32b active parameters.

License: MIT with the following modification:

Our only modification part is that, if the Software (or any derivative works thereof) is used for any of your commercial products or services that have more than 100 million monthly active users, or more than 20 million US dollars (or equivalent in other currencies) in monthly revenue, you shall prominently display "Kimi K2.5" on the user interface of such product or service.

bertili 27 January 2026

The "Deepseek moment" is just one year ago today!

Coincidence or not, let's just marvel for a second over this amount of magic/technology that's being given away for free... and how liberating and different this is than OpenAI and others that were closed to "protect us all".

jumploops 27 January 2026

> For complex tasks, Kimi K2.5 can self-direct an agent swarm with up to 100 sub-agents, executing parallel workflows across up to 1,500 tool calls.

> K2.5 Agent Swarm improves performance on complex tasks through parallel, specialized execution [..] leads to an 80% reduction in end-to-end runtime

Not just RL on tool calling, but RL on agent orchestration, neat!

culi 27 January 2026

I posted this elsewhere but thought I'd repost here:

* https://lmarena.ai/leaderboard — crowd-sourced head-to-head battles between models using ELO

* https://dashboard.safe.ai/ — CAIS' incredible dashboard

* https://clocks.brianmoore.com/ — a visual comparison of how well models can draw a clock. A new clock is drawn every minute

* https://eqbench.com/ — emotional intelligence benchmarks for LLMs

* https://www.ocrarena.ai/battle — OCR battles, ELO

* https://mafia-arena.com/ — LLMs playing the social deduction game Mafia

* https://openrouter.ai/rankings — marketshare based on OpenRouter

vinhnx 27 January 2026

One thing caught my eyes is that besides K2.5 model, Moonshot AI also launched Kimi Code (https://www.kimi.com/code), evolved from Kimi CLI. It is a terminal coding agent, I've been used it last month with Kimi subscription, it is capable agent with stable harness.

GitHub: https://github.com/MoonshotAI/kimi-cli

Alifatisk 27 January 2026

Have you all noted that the latest releases (Qwen3 max thinking, now Kimi k2.5) from Chinese companies are benching against Claude opus now and not Sonnet? They are truly catching up, almost at the same pace?

Reubend 27 January 2026

I've read several people say that Kimi K2 has a better "emotional intelligence" than other models. I'll be interested to see whether K2.5 continues or even improves on that.

2001zhaozhao 27 January 2026

The directionally interesting part is that according to the announcement, K2.5 seems to be trained specifically to create sub-agents and work in an agent swarm usefully. The key part is that you don't need to manually create or prompt sub-agents, K2.5 creates them automatically, so from the looks of things it's similar to Claude Code dynamic sub-agents except the model is trained to scale to many more agents autonomously.

I wonder whether Claude is doing the same kind of training and it's coming with the next model, and that's why the agent swarm mode in Claude Code is hidden for now. We might be getting very very good agent orchestrators/swarms very soon.

zmmmmm 27 January 2026

Curious what would be the most minimal reasonable hardware one would need to deploy this locally?

Topfi 27 January 2026

K2 0905 and K2 Thinking shortly after that have done impressively well in my personal use cases and was severely slept on. Faster, more accurate, less expensive, more flexible in terms of hosting and available months before Gemini 3 Flash, I really struggle to understand why Flash got such positive attention at launch.

Interested in the dedicated Agent and Agent Swarm releases, especially in how that could affect third party hosting of the models.

throwaw12 27 January 2026

Congratulations, great work Kimi team.

Why is that Claude still at the top in coding, are they heavily focused on training for coding or is it their general training is so good that it performs well in coding?

Someone please beat the Opus 4.5 in coding, I want to replace it.

spaceman_2020 27 January 2026

Kimi was already one of the best writing models. Excited to try this one out

simonw 27 January 2026

Pretty cute pelican https://tools.simonwillison.net/svg-render#%3Csvg%20viewBox%...

hmate9 27 January 2026

About 600GB needed for weights alone, so on AWS you need an p5.48xlarge (8× H100) which costs $55/hour.

Barathkanna 27 January 2026

A realistic setup for this would be a 16× H100 80GB with NVLink. That comfortably handles the active 32B experts plus KV cache without extreme quantization. Cost-wise we are looking at roughly $500k–$700k upfront or $40–60/hr on-demand, which makes it clear this model is aimed at serious infra teams, not casual single-GPU deployments. I’m curious how API providers will price tokens on top of that hardware reality.

Jackson__ 27 January 2026

As your local vision nut, their claims about "SOTA" vision are absolutely BS in my tests.

Sure it's SOTA at standard vision benchmarks. But on tasks that require proper image understanding, see for example BabyVision[0] it appears very much lacking compared to Gemini 3 Pro.

[0] https://arxiv.org/html/2601.06521v1

striking 27 January 2026

https://archive.is/P98JR

pu_pe 27 January 2026

I don't get this "agent swarm" concept. You set up a task and they boot up 100 LLMs to try to do it in parallel, and then one "LLM judge" puts it all together? Is there anywhere I can read more about it?

erichocean 27 January 2026

Running on Apple Silicon: https://x.com/awnihannun/status/2016221496084205965

enricoros 27 January 2026

CCP-bench has gotten WAY better on K2.5!

https://big-agi.com/static/kimi-k2.5-less-censored.jpg

teiferer 27 January 2026

Can we please stop calling those models "open source"? Yes the weights are open. So, "open weight" maybe. But the source isn't open, the thing that allows to re-create it. That's what "open source" used to mean. (Together with a license that allows you to use that source for various things.)

DeathArrow 27 January 2026

Those are some impressive benchmark results. I wonder how well it does in real life.

Maybe we can get away with something cheaper than Claude for coding.

stopachka 27 January 2026

Is there a startup that takes models like this, and effectively gives you a secure setup, where you have (a) a mobile app that (b) talks to some giant machine that only you have access too.

If a 10K computer could run this, it may be worth it to have a "fully on prem" version of ChatGPT running for you.

dev_l1x_be 27 January 2026

I had these weird situations like some models are refusing to use SSH as a tool. Not sure if it was the coding tool limitation or it is baked into in some of the models.

monkeydust 27 January 2026

Is this actually good or just optimized heavily for benchmarks? I am hopefully its the former based on the writeup but need to put it through its paces.

jdeng 27 January 2026

Glad to to see open source models are catching up and treat vision as first-class citizen (a.k.a native multimodal agentic model). GLM and Qwen models takes different approach, by having a base model and a vision variant (glm-4.6 vs glm-4.6v).

I guess after Kimi K2.5, other vendors are going to the same route?

Can't wait to see how this model performs on computer automation use cases like VITA AI Coworker.

https://www.vita-ai.net/

pplonski86 27 January 2026

There are so many models, is there any website with list of all of them and comparison of performance on different tasks?

mangolie 27 January 2026

they cooked

lrvick 27 January 2026

Actually open source, or yet another public model, which is the equivalent of a binary?

URL is down so cannot tell.

billyellow 27 January 2026

Cool

rvz 27 January 2026

The chefs at Moonshot have cooked once again.