Windows-Use: an AI agent that interacts with Windows at GUI layer

(github.com)

Comments

kh9000 12 September 2025
Using the UIA tree as the currency for LLMs to reason over always made more sense to me than computer vision, screenshot based approaches. It’s true that not all software exposes itself correctly via UIA, but almost all the important stuff does. VS code is one notable exception (but you can turn on accessibility support in the settings)
philfreo 12 September 2025
Cool. Reminds me of using SendKeys() in Visual Basic 6 in the 90s

https://learn.microsoft.com/en-us/dotnet/api/microsoft.visua...

mtVessel 12 September 2025
I feel vaguely vindicated that the agent can't figure out how to use the modern Save as workflow, either, and reverts to the traditional dialog.
electroly 12 September 2025
Looks awesome. I've attempted my own implementation, but I never got it to work particularly well. "Open Notepad and type Hello World" was a triumph for me. I landed on the UIA tree + annotated screenshot combination, too, but mine was too primitive, and I tried to use GPT which isn't as good at image tasks as Gemini as used here. Great job!
yodon 12 September 2025
Very cool - does anyone know of an OSX equivalent?

Preferably one that is similarly able to understand and interact with web page elements, in addition to app elements and system elements.

tiahura 12 September 2025
LLM’s do a pretty good job of using pywin32 for programs that support COM like office.
dvt 12 September 2025
Working on something very similar in Rust. It's quite magical when it works (that's a big caveat, as I'm trying to make it work with local LLMs). Very cool implementation, and imo, this is the future of computing.
AfterHIA 12 September 2025
I remember an older friend asking me recently; will there be a thing soon where I can make my computer go on auto-pilot?

I guess I can answer, "yes I think so."

MurageKabui 13 September 2025
Awesome job! I'm working on a similar Agent that's highly dependent on AutoIt.
KaseKun 12 September 2025
Can it farm a ber rune for me?
vivzkestrel 13 September 2025
genuinely asking, what do you think are the use cases for someone requiring this?