Evaluating and mitigating the growing risk of LLM-discovered 0-days

(red.anthropic.com)

Comments

lebovic 5 hours ago
The post is light on details, and I agree with the sentiment that it reads like marketing. That said, Opus 4.6 is actually a legitimate step up in capability for security research, and the red team at Anthropic – who wrote this post – are sincere in their efforts to demonstrate frontier risks.

Opus 4.6 is a very eager model that doesn't give up easily. Yesterday, Opus 4.6 took the initiative to aggressively fuzz a public API of a frontier lab I was investigating, and it found a real vulnerability after 100+ uninterrupted tool calls. That would have required lots of of prodding with previous models.

If you want to experience this directly, I'd recommend recording network traffic while using a web app, and then pointing Claude Code at the results (in Chrome, this is Dev Tools > Network > Export HAR). It makes for hours of fun, but it's also a bit scary.

nielsbot 51 minutes ago
Wondering how many of these memory errors would be caught by running the Clang Static Analyzer (or similar) on them.

https://clang-analyzer.llvm.org

Alternatively, testing these projects with ASan enabled:

https://clang.llvm.org/docs/AddressSanitizer.html

samfundev 16 hours ago
Glad to see that they brought in humans to validate and patch vulnerabilities. Although, I really wish they linked to the actual patches. Here's what I could find:

https://cgit.ghostscript.com/cgi-bin/cgit.cgi/ghostpdl.git/c...

https://github.com/OpenSC/OpenSC/pull/3554

https://github.com/dloebl/cgif/pull/84

tznoer 8 hours ago
Grepping for strcat() is at the "forefront of cybersecurity"? The other one that applied a GitHub comment to a different location does not look too difficult either.

Everything that comes out of Anthropic is just noise but their marketing team is unparalleled.

catlifeonmars 2 hours ago
> Our view is this is a moment to move quickly—to empower defenders and secure as much code as possible while the window exists.

Yawn.

username223 5 hours ago
"Evaluating and mitigating the growing risk of LLM-developed 0-days" would be much more interesting and useful. Try harder, guys.
cyanydeez 7 hours ago
Is there a polymarket on the first billion dollar AI company to 0$ by their own insecure Model deployment?
octoberfranklin 7 hours ago
This reads like an advertisement for Anthropic, not a technical article.