Show HN: TetrisBench – Gemini Flash reaches 66% win rate on Tetris against Opus Hackernews Viewer

Show HN: TetrisBench – Gemini Flash reaches 66% win rate on Tetris against Opus

111 points by ykhli 26 January 2026 | 41 comments

Comments

bubblesorting 26 January 2026

Very cool! I am a good Tetris player (in the top 10% of players) and wanted to give brick yeeting against an LLM a spin.

Some feedback: - Knowing the scoring system is helpful when going 1v1 high score

- Use a different randomization system, I kept getting starved for pieces like I. True random is fine, throwing a copy of every piece into a bag and then drawing them one by one is better (7 bag), nearly random with some lookbehind to prevent getting a string of ZSZS is solid, too (TGM randomizer)

- Piece rotation feels left-biased, and keeps making me mis-drop, like the T pieces shift to the left if you spin 4 times. Check out https://tetris.wiki/images/thumb/3/3d/SRS-pieces.png/300px-S... or https://tetris.wiki/images/b/b5/Tgm_basic_ars_description.pn... for examples of how other games are doing it.

- Clockwise and counter-clockwise rotation is important for human players, we can only hit so many keys per second

- re-mappable keys are also appreciated

Nice work, I'm going to keep watching.

ykhli 26 January 2026

Thanks for all the questions! More details on how this works:

- Each model starts with an initial optimization function for evaluating Tetris moves.

- As the game progresses, the model sees the current board state and updates its algorithm—adapting its strategy based on how the game is evolving.

- The model continuously refines its optimizer. It decides when it needs to re-evaluate and when it should implement the next optimization function

- The model generates updated code, executes it to score all placements, and picks the best move.

- The reason I reframed this problem to a coding problem is Tetris is an optimization game in nature. At first I did try asking LLMs where to place each piece at every turn but models are just terrible at visual reasoning. What LLMs great at though is coding.

bityard 26 January 2026

Looks fun, but I'm not willing to give out my email address just to play a game.

Also, if the creator is reading this, you should know that Tetris Holdings is extremely aggressive with their trademark enforcement.

OGEnthusiast 26 January 2026

Gemini 3 Flash is at a very nice point along the price-performance curve. A good workhorse model, while supplementing it with Opus 4.5 / Gemini 3 Pro for more complex tasks.

vunderba 26 January 2026

Interesting but frustratingly vague on details. How exactly are the models playing? Is it using some kind of PGN equivalent in Tetris that represents a on-going game, passing an ASCII representation, encoding as a JSON structure, or just directly sending screenshots of the game to the various LLMs?

burkaman 26 January 2026

It's actually 80% against Opus, 66% average against the 5 models it's tested with.

augusteo 27 January 2026

LLMs playing Tetris feels like testing a calculator's ability to write poetry. Interesting as a curiosity, but the results don't transfer to the tasks where these models actually excel.

Curious what the latency looks like per move. That seems like the actual bottleneck here.

p0w3n3d 26 January 2026

Guys, I don't know how to tell you but... Tetris can web solved without LLM...

esafak 26 January 2026

I imagine this is because Tetris is visual and the Gemini models are strong visually.

akomtu 26 January 2026

It would be more interesting to make it build a chess engine and compare it against Stockfish. The chess engine should be a standalone no-dependencies C/C++ program that fits in NNN lines of code.

arendtio 26 January 2026

There are some concepts clashing here.

I mean, if you let the LLM build a testris bot, it would be 1000x better than what the LLMs are doing. So yes, it is fun to win against an AI, but to be fair against such processing power, you should not be able to win. It is only possible because LLMs are not built for such tasks.

tiahura 26 January 2026

I'd like to see a nethackbench.

segmondy 26 January 2026

... and what does this prove? what can you decide to use one LLM to solve over another based on this tetrisbench besides play tetris?

indigodaddy 26 January 2026

Is there a tl;dr on why this is? Does it just make faster decisions?

purplecats 26 January 2026

watch link?