Poker Tournament for LLMs Hackernews Viewer

Poker Tournament for LLMs

244 points by SweetSoftPillow 11 hours ago | 160 comments

Comments

I have PhD in algorithmic game theory and worked on poker.

1) There are currently no algorithms that can compute deterministic equilibrium strategies [0]. Therefore, mixed (randomized) strategies must be used for professional-level play or stronger.

2) In practice, strong play has been achieved with: i) online search and ii) a mechanism to ensure strategy consistency. Without ii) an adaptive opponent can learn to exploit inconsistency weaknesses in a repeated play.

3) LLMs do not have a mechanism for sampling from given probability distributions. E.g. if you ask LLM to sample a random number from 1 to 10, it will likely give you 3 or 7, as those are overrepresented in the training data.

Based on these points, it’s not technically feasible for current LLMs to play poker strongly. This is in contrast with Chess, where there is lots more of training data, there exists a deterministic optimal strategy and you do not need to ensure strategy consistency.

[0] There are deterministic approximations for subgames based on linear programming, but require to be fully loaded in memory, which is infeasible for the whole game.

jonplackett 9 hours ago

I would love to see a live stream of this but they’re also allowed to talk to each other - bluff, trash talk. That would be a much more interesting test of LLMs and a pretty decent spectator sport.

aelaguiz 4 hours ago

This is my area of expertise. I love the experiment.

In general games of imperfect information such as Poker, Diplomacy, etc are much much harder than perfect information games such as Chess.

Multiplayer (3+) poker in particular is interesting because you cannot achieve a nash equilibrium (e.g. it is not zero sum).

That is part of the reason they are a fantastic venue for exploration of the capabilities of LLMs. They also mirror the decision making process of real life. Bezos framed it as "making decisions with about 70% of the information you wish you had."

As it currently stands having built many poker AIs, including what I believe to be the current best in the world, I don't think LLMs are remotely close to being able to do what specialized algorithms can do in this domain.

All of the best poker AI's right now are fundamentally based on counter factual regret minimization. Typically with a layer of real time search on top.

Noam Brown (currently director of research at OpenAI) took the existing CFR strategies which were fundamentally just trying to scale at train time and added on a version of search, allowing it to compute better policies at TEST TIME (e.g. when making decisions). This ultimately beat the pros (Pluribus beat the pros at 6 max in 2018 I believe). It stands as the state of the art, although I believe that some of the deep approaches may eventually topple it.

Not long after Noam joined OpenAI they released the o1-preview "thinking" models, and I can't help but think that he took some of his ideas for test time compute and applied them on top of the base LLM.

It's amazing how much poker AI research is actually influencing the SOTA AI we see today.

I would be surprised if any general purpose model can achieve true human level or super human level results, as the purpose built SOTA poker algorithms at this point play substantially perfect poker.

Background:

- I built my first poker AI when I was in college, made half a million bucks on party poker. It was a pseudo expert system. - Created PokerTableRatings.com and caught cheaters at scale using machine learning on a database of all poker hands in real time - Sold my poker AI company to Zynga in 2011 and was Zynga Poker CTO for 2 years pre/post IPO - Most recently built a tournament version of Pluribus (https://www.science.org/doi/10.1126/science.aay2400). Launching as duolingo for poker at pokerskill.com

eclark 4 hours ago

I am the author/maintainer of rs-poker ( https://github.com/elliottneilclark/rs-poker ). I've been working on algorithmic poker for quite a while. This isn't the way to do it. LLMs would need to be able to do math, lie, and be random. None of which are they currently capable.

We know how to compute the best moves in poker (it's computationally challenging; the more choices and players are present, the more likely it is that most attempts only even try at heads-up).

With all that said, I do think there's a way to use attention and BERT to solve poker (when trained on non-text sequences). We need a better corpus of games and some training time on unique models. If anyone is interested, my email is elliott.neil.clark @ gmail.com

andreyk 2 hours ago

For reference, the details about how the LLMs are queried:

"How the players work

    All players use the same system prompt
    Each time it's their turn, or after a hand ends (to write a note), we query the LLM
    At each decision point, the LLM sees:
        General hand info — player positions, stacks, hero's cards
        Player stats across the tournament (VPIP, PFR, 3bet, etc.)
        Notes hero has written about other players in past hands
    From the LLM, we expect:
        Reasoning about the decision
        The action to take (executed in the poker engine)
        A reasoning summary for the live viewer interface
    Models have a maximum token limit for reasoning
    If there's a problem with the response (timeout, invalid output), the fallback action is fold"

The fact the models are given stats about the other models is rather disappointing to me, makes it less interesting. Would be curious how this would go if the models had to only use notes/context would be more interesting. Maybe it's a way to save on costs, this could get expensive...

the_injineer 9 hours ago

We (TEN Protocol) did this a few months ago, using blockchain to make the LLMs’ actions publicly visible and TEEs for verifiable randomness in shuffling and other processes. We used a mix of LLMs across five players and ran multiple tournaments over several months. The longest game we observed lasted over 50 hours straight.

Screenshot of the gameplay: https://pbs.twimg.com/media/GpywKpDXMAApYap?format=png&name=... Post: https://x.com/0xJba/status/1907870687563534401 Article: https://x.com/0xJba/status/1920764850927468757

If anybody wants to spectate this, let us know we can spin up a fresh tournament.

miggol 10 hours ago

I wonder if these will get better over time. Fun idea and I kind of want to join a table.

For now at least, some can't even determine which hand they have:

> LLAMA bets $170 on Flop > "We have top pair with Tc4d on a flop of 2s Ts Jh. The board is relatively dry, and we have a decent chance of having the best hand. We're betting $170.00 to build the pot and protect our hand."

(That's not top pair)

Sweepi 8 hours ago

Imo, this shows that LLMs are nice for compression, OCR and other similar tasks, but there is 0% thinking / logic involved:

magistral: "Turn card pairs the board with a T, potentially completing some straights and giving opponents possible two-pair or better hands"

A card which pairs the board does not help with straights. The opposite is true. Far worse then hallucinating a function signature which does not exist, if you base anything on these types of fundamental errors, you build nothing.

Read 10 turns on the website and you will find 2-3 extreme errors like this. There needs to be a real breakthrough regarding actual thinking(regardless of how slow/expensive it might be) before I believe there is a path to AGI.

pablorodriper 8 hours ago

I gave a talk on this topic at PyConEs just 10 days ago. The idea was to have each (human) player secretly write a prompt, then use the same model to see which one wins.

It’s just a proof of concept, but the code and instructions are here: https://github.com/pablorodriper/poker_with_agents_PyConEs20...

alexjurkiewicz 10 hours ago

It doesn't seem like the design of this experiment allows AIs to evolve novel strategy over time. I wonder if poker-as-text is similar to maths -- LLMs are unable to reason about the underlying reality.

crackpype 8 hours ago

It seems to be broken? For example in this hand, the hand finishes at the turn even though 2 players still live.

https://pokerbattle.ai/hand-history?session=37640dc1-00b1-4f...

camillomiller 10 hours ago

As a Texas Hold'em enthusiast, some of the hands are moronic. Just checked one where grok wins with A3s because Gemini folds K10 with an Ace and a King on the board, without Grok betting anything. Gemini just folds instead of checking. It's not even GTO, it's just pure hallucination. Meaning: I wouldn't read anything into the fact that Grok leads. These machines are not made to play games like online poker deterministically and would be CRUSHED in GTO. It would be more interesting instead to understand if they could play exploitatively.

rzk 9 hours ago