Very cool! I am a good Tetris player (in the top 10% of players) and wanted to give brick yeeting against an LLM a spin.
Some feedback:
- Knowing the scoring system is helpful when going 1v1 high score
- Use a different randomization system, I kept getting starved for pieces like I. True random is fine, throwing a copy of every piece into a bag and then drawing them one by one is better (7 bag), nearly random with some lookbehind to prevent getting a string of ZSZS is solid, too (TGM randomizer)
Thanks for all the questions! More details on how this works:
- Each model starts with an initial optimization function for evaluating Tetris moves.
- As the game progresses, the model sees the current board state and updates its algorithm—adapting its strategy based on how the game is evolving.
- The model continuously refines its optimizer. It decides when it needs to re-evaluate and when it should implement the next optimization function
- The model generates updated code, executes it to score all placements, and picks the best move.
- The reason I reframed this problem to a coding problem is Tetris is an optimization game in nature. At first I did try asking LLMs where to place each piece at every turn but models are just terrible at visual reasoning. What LLMs great at though is coding.
Interesting but frustratingly vague on details. How exactly are the models playing? Is it using some kind of PGN equivalent in Tetris that represents a on-going game, passing an ASCII representation, encoding as a JSON structure, or just directly sending screenshots of the game to the various LLMs?
Gemini 3 Flash is at a very nice point along the price-performance curve. A good workhorse model, while supplementing it with Opus 4.5 / Gemini 3 Pro for more complex tasks.
LLMs playing Tetris feels like testing a calculator's ability to write poetry. Interesting as a curiosity, but the results don't transfer to the tasks where these models actually excel.
Curious what the latency looks like per move. That seems like the actual bottleneck here.
I mean, if you let the LLM build a testris bot, it would be 1000x better than what the LLMs are doing. So yes, it is fun to win against an AI, but to be fair against such processing power, you should not be able to win. It is only possible because LLMs are not built for such tasks.
It would be more interesting to make it build a chess engine and compare it against Stockfish. The chess engine should be a standalone no-dependencies C/C++ program that fits in NNN lines of code.
Show HN: TetrisBench – Gemini Flash reaches 66% win rate on Tetris against Opus
(tetrisbench.com)106 points by ykhli 20 hours ago | 38 comments
Comments
Some feedback: - Knowing the scoring system is helpful when going 1v1 high score
- Use a different randomization system, I kept getting starved for pieces like I. True random is fine, throwing a copy of every piece into a bag and then drawing them one by one is better (7 bag), nearly random with some lookbehind to prevent getting a string of ZSZS is solid, too (TGM randomizer)
- Piece rotation feels left-biased, and keeps making me mis-drop, like the T pieces shift to the left if you spin 4 times. Check out https://tetris.wiki/images/thumb/3/3d/SRS-pieces.png/300px-S... or https://tetris.wiki/images/b/b5/Tgm_basic_ars_description.pn... for examples of how other games are doing it.
- Clockwise and counter-clockwise rotation is important for human players, we can only hit so many keys per second
- re-mappable keys are also appreciated
Nice work, I'm going to keep watching.
- Each model starts with an initial optimization function for evaluating Tetris moves.
- As the game progresses, the model sees the current board state and updates its algorithm—adapting its strategy based on how the game is evolving.
- The model continuously refines its optimizer. It decides when it needs to re-evaluate and when it should implement the next optimization function
- The model generates updated code, executes it to score all placements, and picks the best move.
- The reason I reframed this problem to a coding problem is Tetris is an optimization game in nature. At first I did try asking LLMs where to place each piece at every turn but models are just terrible at visual reasoning. What LLMs great at though is coding.
Also, if the creator is reading this, you should know that Tetris Holdings is extremely aggressive with their trademark enforcement.
Curious what the latency looks like per move. That seems like the actual bottleneck here.
I mean, if you let the LLM build a testris bot, it would be 1000x better than what the LLMs are doing. So yes, it is fun to win against an AI, but to be fair against such processing power, you should not be able to win. It is only possible because LLMs are not built for such tasks.