Computers Conquer Heads-Up Limit Hold'em

Dn_Ab · on Jan 8, 2015

This uses a particular form of a fundamentally simple yet surprisingly powerful class of learning algorithms called regret minimization. CFR is interesting in an of itself as it specializes regret minization to play extensive form games. There are also CFR algorithms to play multiplayer and no-limit games and though the guarantees of optimality are no longer there, the players are still strong (but for now, far away from experts).

The article states that this algorithm is weak to bad players but that's more an artifact of resources and training method; one advantage of minimizing regret on games instead of using linear programming is that online learning versions can adapt to exploit poor play with payoff larger than the game's value.

I've also posted here before that RM solves 2 player Zero sum game more efficiently than linear programming and how it's related to boosting, portfolio optimization and as an abstraction of natural selection.

http://www.pnas.org/content/111/29/10620.full

jcr · on Jan 8, 2015

>"The algorithm, named CFR+ by its creators, uses an improved version of a technique called counterfactual regret minimization (CFR). Past CFR algorithms have tried to solve poker by using several steps at each decision point: coming up with counterfactual values representing different game outcomes; applying a regret minimization approach to figure out the strategy leading to the best outcome; and averaging the latest strategy with all past strategies."

Seems like a useful algorithm for dating sites.

(only half-joking)

javajosh · on Jan 8, 2015

You're not the only one to think "relationships" when the phrase "regret minimization" came up!

wglb · on Jan 9, 2015

It would appear that this is some of the team at Alberta that "solved" checkers: http://en.wikipedia.org/wiki/Chinook_(draughts_player)

The book http://www.amazon.com/One-Jump-Ahead-Jonathan-Schaeffer-eboo... One Jump Ahead details the checkers effort.

I highly recommend this book.

The "solved" aspect of resembles enumerating all possible outcomes.

ssharp · on Jan 8, 2015

Abysmal headline. The bot conquered heads-up, limit hold 'em, a game with next to no popularity. Conquering heads-up may be a good first step, though I'd reason that making a bot that can beat a full limit table is exponentially harder than making a bot that can beat heads-up limit.

Full table (9 or 10 person) limit hold 'em has been mostly dead in casinos for well over a decade and it only had a shelf life of a few years online, even during the boom.

There are better edges (for the sharks) and more excitement (for the fish) in no-limit and pot-limit games. I don't see a bot being close to be able to conquer heads-up no-limit, let alone a full table of no-limit. I suspect most heads-up limit happens in private games or the end of a limit poker tournament. It's played in scenarios that will not really be that exploitable should a person grok this bot's abilities.

The article also mentions that it's opponent was another bot that was also playing a very strong strategy. That suggests it's not really able to adapt to individual play, which is essential to being at profitable at all but the lowest levels of poker.

frinxor · on Jan 8, 2015

Limit hold'em is popular, and can be found in almost every card room as well as online.

HU limit is probably only online, and theres a fair amount of bots prevalent in the games already today.

> The article also mentions that it's opponent was another bot that was also playing a very strong strategy. That suggests it's not really able to adapt to individual play, which is essential to being at profitable at all but the lowest levels of poker.

The computer bots attempt to play Game Theory Optimal - meaning it plays an (optimal) strategy that does not lose overall, regardless of stake/opponent. Against another GTO player, it will breakeven - otherwise it will be profitable. That is the holy grail of poker (and any game). If you are not playing GTO, there are leaks/mistakes in your game that can be exploited.

Check out https://www.youtube.com/watch?v=VHcrsMPQtgo , a poker theory course that talks about solving a simplified poker limit game

cxseven · on Jan 8, 2015

The concept of Nash equilibrium doesn't apply to a poker player facing two or more human opponents at the table, since they are likely not playing optimally or even non-cooperatively, e.g. if there is a communication channel such as facial expressions (however poker-faced) that a robot player is not privy to.

In the case of a poker tournament where there are at least a few non-optimal players, it's advantageous to exploit and take chips from them before other players do. A player who plays as though all of his opponents are playing optimally probably plays too conservatively in that early stage and no doubt faces a significant (non-optimal) disadvantage. Your comments still apply to heads-up (two person) poker, though.

ssharp · on Jan 9, 2015

Limit Hold Em pales in comparison to No Limit at every cardroom I've been in the past 10 years. Yes, limit still gets spread, but the limit choices are very poor in most casinos.

stevewepay · on Jan 8, 2015

I play poker in the card rooms around the Bay Area very regularly, and limit poker is absolutely popular at the card rooms that I go to, as well as in Vegas. To say that it is dead is completely untrue.

mmanfrin · on Jan 8, 2015

It's my understanding that limit is popular at places like the Oaks because No Limit was not allowed for a while, until there were means of getting around it (Oaks has 'cap no limit' for example -- can't bet above the max table buyin; Bay101 has something similar). I see the current popularity as a function more of inertia than of actual interest. A majority of the tournaments and SNGs that the Oaks runs are no limit.

stevewepay · on Jan 8, 2015

This is also not true. A lot of people enjoy playing limit poker, like 3-6, because they can chase big hands much cheaper. It's a completely different strategy than no-limit, where you could potentially risk your entire stack in any hand you're in.

The other thing is that poker has an unexpected (at least to me) social aspect to it. The people who play against each other are largely friendly to each other, and they play against each other every day so playing 3-6 limit lets you spend more time at the tables than 1-2 or 3-5 No Limit, etc.

ssharp · on Jan 9, 2015

California poker isn't typical of most other regional casinos. Look at the Bravo poker app and browse through what games are spread across the country. I have fond memories of limit poker thanks to the old 6max games online, but limit just isn't remotely popular anymore.

iztyhi · on Jan 8, 2015

>That suggests it's not really able to adapt to individual play, which is essential to being at profitable at all but the lowest levels of poker.

There are two main categories of poker strategy: exploitive and balanced (also known as game theory optimal or GTO). With an exploitive strategy, your goal is to find the mistakes your opponents make (eg their tells) and figure out how to profit from them. With a balanced strategy, your goal is to 'balance' the choices you make (eg don't have any tells) so that it's impossible to your opponent to gain any information about your hand.

A perfectly balanced strategy will never lose money over a sufficient sample size, but it also limits the amount of money you can win. Exploitive strategies will maximize the amount you can win at the risk of being exploited by other players. From my understanding, your best bet against the weakest players is to play a purely exploitive strategy, and against good human players, you'll want to blend exploitive and balanced strategies.

My understanding of poker bots (the Alberta group from the article and PokerSnowie are the best-known) is that their goal is to develop a true balanced strategy. This may not be the way to maximize profits, but as long as humans aren't capable of playing perfectly balanced, the bots will slowly win in the long run, even against top pros.

randlet · on Jan 8, 2015

"making a bot that can beat a full limit table is exponentially harder "

Hard but not impossible. As long as 5-6 years ago there was a number of bots[1] beating mid stake (15/30 & 30/60) 6 max limit holdem on Ultimate Bet (before it imploded). The games were fairly soft but pretty aggressive and those bots were taking a lot of money out of the game. They were also very frustrating to play against.

[1] They all had incredibly similar playing styles so were likely operated by a single person/group.

llamataboot · on Jan 8, 2015

This is just completely backwards. Full ring limit is much easier to play optimally than heads up. A bot could break even with a good starting hand list and basic actions. Heads up play involves many many situations where the optimal plays are highly non-intuitive and involve both exploiting patterns in your opponents play and "advertising" for the future. This is because both players /usually/ have weak hands. Heads up limit poker requires a very strong understanding of game theory, not just "poker theory". Combine that with the fact that if you have an exploitable leak, it will get exploited over and over again.

Heads up limit poker was /quite/ popular before the US cracked down and online gaming and many players made a ton of money off it. Towards the end, it was clear botters were getting in on the action, and bots were already playing an extremely difficult game.

You are correct that no-limit, especially heads-up no-limit remains a tougher nut to crack from an AI perspective, but this is pretty damn impressive

NB: I played online poker professionally for a few years.

swyman · on Jan 8, 2015

One of the most famous games of all time in the poker world saw swings of $10M+ in each direction and was exclusively heads-up, limit HE: http://www.pokernews.com/news/2006/02/beal-corporation-finis...

iztyhi · on Jan 8, 2015

> I don't see a bot being close to be able to conquer heads-up no-limit, let alone a full table of no-limit.

PokerSnowie plays NLHE, but only uses 3 bet sizes (half, full, and double pot) to make the number of decisions manageable. I believe it plays heads up and 6max. Of course, I don't know if we have any evidence to suggest it's "conquered" those games, and we don't fully know how much the fixed bet sizes limit its potential.

http://www.pokersnowie.com/faq.html#brain1

encoderer · on Jan 8, 2015

Limit is almost a different game than no limit holdem. Simply put: Bluffing is not a problem you need to solve to win in Limit holdem.

thret · on Jan 8, 2015

Completely wrong on both counts. It is quite literally a different game, and bluffing is arguably more important in limit than in no limit.

tmuir · on Jan 9, 2015

Could you elaborate on this? Maybe the group I play with just sets the limit too low, but it's almost to the point that bluffing is pointless, and you only win with good cards. We usually play well into the night, so 5 or 6 hours. On most nights, there are less than 3 hands that don't result in showing cards. What good is bluffing if the threat of losing isn't great enough to make people fold?

It's a nice social outlet, but I would definitely prefer no limit.

thret · on Jan 9, 2015

Oh. Yes, low limit full ring games are quite different. If every hand comes to showdown you are all playing loose, and the correct strategy is indeed tight aggressive.

The article and my comment are referring to a two player limit holdem game vs an opponent who does not make mistakes. So I presume this is equivalent to a high stakes match with two professionals.

encoderer · on Jan 9, 2015

Playing no limit, a bot would have to handle an opponent who makes a big bet in an attempt to induce the bot to fold.

Moreover, bluffing isn't as crucial in Limit b/c you have little leverage to push somebody off their hand. Many people will chase because the pot odds or their own playing style demand it.

Finally, NL or Limit are both poker. In fact, they're both Texas Hold'em. So "quite literally" i respectfully disagree with your assessment.

I'm not a professional poker player, but I did work with one to build a bot about 9 years ago on top of WinHoldem. It was more or less a break even bot on PokerStars if left to play itself, but its real value to that team was as a force multiplier. It made multi-tabling easier, raising interrupts whenever the bot woke up with a hand.

I'm not one for online debates, so feel free to have the last word. To put it another way, I check in the dark.

thret · on Jan 9, 2015

I guess chalk it up to semantics? I could happily argue that online NLHE and live NLHE are also two different games. Like comparing tennis to squash - similar, but not the same. Of course, this would be better argued on 2+2.

ikeboy · on Jan 9, 2015

>After all, Cepheus honed its strategy by playing the equivalent of a near-perfect opponent that made practically no mistakes.

So why don't you f king use that instead of making your own program?

Also, how do they know that play is optimal if poker hadn't been solved before? They're comparing it to perfect play, but if we can compute that, then the problem is done anyway.

olegbl · on Jan 9, 2015

It's possible they simply created another AI and gave it full information (including the opponent's hand) - essentially making a "perfect" player (who cheats). Not a useful program, but a good play-mate.

ikeboy · on Jan 11, 2015

Optimal play against a player who knows your hand is not optimal against one who doesn't; for example you should never bluff in this case.

Your suggestion makes no sense, sorry.

bluecalm · on Jan 8, 2015

It would be nice to have more details on that CFR+ approach. If you just do what the article claims (substitute averages with regret from recent iteration) you will end up with oscillating solutions which never reach the equilibrium (I just tried it with naive CFR implementation). There is surely more to that and it would be nice to see what.

As to the claim of "conquering". While there is no reason to not believe them let's see how they fare vs other near optimal AIs. There is a lot of scope for numerical mistakes when you are dealing with solutions that big and it may well be that they missed something along the way. Some other teams claimed they solved HU holdem some time ago (and without supercomputers). They compete in Alberta yearly championship every year so it will be easy to see how it goes for Cepheus.

jcr · on Jan 9, 2015

There are more details on CFR+ in the Science Magazine article [1] and the in paper [2]. These were posted by hn user 'benktbyte' in another story.

[1] http://www.sciencemag.org/content/347/6218/145

[2] https://pdf.yt/d/qv-O9AwQuV1Kjb04

bluecalm · on Jan 9, 2015

Thank you very much. I see that they've changed how regret matching works and it that context it makes to only remember values from last iteration (or average from n last iterations). Very interesting idea. I am still unable to make it work faster than naive CFR but I am probably missing some details.

curun1r · on Jan 8, 2015

> While there is no reason to not believe them let's see how they fare vs other near optimal AIs

They mentioned that:

Burch warned that human poker players should take the Cepheus strategy with a grain of salt. After all, Cepheus honed its strategy by playing the equivalent of a near-perfect opponent that made practically no mistakes. Certain strategies that wouldn’t work against such a powerful opponent could still prove very profitable for human poker players when exploiting the mistakes of other human players.

mmanfrin · on Jan 8, 2015

'Conquers' in the sense it always plays +EV hands. This is exactly what 'grinders' do -- you just play the hands that have a long-term positive expected value. You play that way long enough, your risk goes to zero and you normalize out at a regular return.

DennisP · on Jan 8, 2015

This may not be what you mean, but it sounds like you're saying the computer just plays hands with favorable odds of getting winning cards. That's trivial and not at all what this is doing. It's finding the game-theory-optimal play, which means it bluffs in the most effective way possible.

It does that with the assumption that the opponent is also very strong, so it doesn't attempt to exploit weaknesses. If you try to exploit weaknesses, you have to vary from optimal play, thus opening up your own weaknesses. The software doesn't do that, it just plays a perfect game.

ikeboy · on Jan 9, 2015

If perfect play is known, why not just use that? The article compared this program with perfect play, which means that perfect play must be computable. I'm confused.

DennisP · on Jan 9, 2015

Perfect play exists but is extremely difficult to compute because it's such a large problem. But they've shown that this program comes very close.

Perfect play also exists in games like chess and checkers. Checkers has been completely solved and computers can play it perfectly. This is not true for chess, though computers play quite well. Go is even harder and the best humans easily beat the best Go computers.

ikeboy · on Jan 11, 2015

Do you mean they only tested it against perfect play for a subset of hands, but ran their program for all hands, and the "close to perfect play" is only statistical?

Otherwise, how could they show it's close if they can't compute perfect?

jib · on Jan 8, 2015

A good h2h limit strategy probably plays 95% of buttons and 70% of big blinds or so, so at least 30% of the hands you play are neutral EV at best (symmetrical game, so at most 50% are +EV preflop and you play at least 80% of all hands).

Only playing hands you are +EV against a full range would be exploitable as it is way too tight in heads up.

GoatOfAplomb · on Jan 8, 2015

I don't agree with your 50% figure. Symmetrical, sure, but remember the blind bets. Once that money is committed, each player can have a higher EV in playing the hand than in folding.

(Related thought experiment - consider a game where each player has $101 in front of them before the hand starts, and then post $50 and $100 blinds. 100% of hands should be played from each position.)

jib · on Jan 9, 2015

Yeah that's fair. It was sloppy language from me.

The point I wanted to make was that you play hands where your odds of winning is <50% a large amount of the time.

gweinberg · on Jan 8, 2015

I don't think that's correct in heads up. I think optimal play includes "bluffing" a fair portion of the time.

thret · on Jan 8, 2015

That's one way to put it. Often nobody has anything that would be playable in a full ring game. The button is paramount, almost like having the serve in tennis.

Also, HU LHE is (much) more complicated than full ring. You have less knowledge about your opponents range, not more. You have to play more hands, a wider range of hands, and you frequently need to make marginal decisions.

iztyhi · on Jan 8, 2015

Indeed. It's pretty easy to win against somebody who only bets with strong hands.

swyman · on Jan 8, 2015

I think, more accurately (and also more impressively), it plays nothing but +EV hands, while also not folding any +EV hands. Avoiding false negatives against a GTO player sounds like the really hard part.

tshaddox · on Jan 8, 2015

So what happens if everyone plays that strategy? Presumably expected earnings approach zero?

lukevdp · on Jan 8, 2015

If everyone plays perfectly, in the long run everyone will have the same amount of money that they started with. In most games there is a rake (fee) taken by the house, so the house would slowly take everyone's money.

frinxor · on Jan 8, 2015

The super-hard part is figuring out what 'perfect play' (GTO) is. Thats what the post is claiming

zirkonit · on Jan 8, 2015

Misleading headline.

This is a complete -solution- for a limited poker version. Really strong, but stochastic, Hold'em algorithms have existed for a long time.

logicalmind · on Jan 8, 2015

Specifically, this is for heads-up (two person) limit texas hold'em.

matilde · on Jan 8, 2015

Best article so far about this: http://www.parttimepoker.com/heads-up-limit-holdem-has-been-...

Pinn2 · on Jan 8, 2015

Limited still means having a choice in what to bet. Trivial EV calculations can only tell you to bet or not, so I assume it's using something more involved.

jib · on Jan 8, 2015

Limit means fixed size betting. Typically 1 big blinds preflop and flop, 2 big blinds on turn and river, with a max of 4 bets total on any street.

The algorithm sounds like it basically identifies the path with the least negative EV for any given action, which is essentially how humans play limit too. "If my opponent plays perfectly, what path will let me lose the least or break even"

iztyhi · on Jan 8, 2015

Indeed, "limit" by itself is shorthand for fixed limit. There are other limit games (such as pot limit and spread limit) where you do have a choice in your bet size.

clintboxe · on Jan 9, 2015

What language is it written in ?

vinayp10 · on Jan 8, 2015

Dude, I need one of these for pokerstars

makerops · on Jan 8, 2015

They exist, fwiw.

iztyhi · on Jan 8, 2015

This probably goes without saying, but they're definitely not allowed.

thret · on Jan 8, 2015

Yep, you write a marginally profitable bot for Stars and then spend all your time trying to get around bot detection. I gave up although the exercise was informative.

Interestingly I get pegged for a bot rather frequently anyway (well, perhaps once a month). It is EXTREMELY annoying when you have to answer a captcha while playing 40 tables, you time out everywhere no matter how fast you are.

MBCook · on Jan 8, 2015

> [...] because of the huge amount of memory required; roughly 262 terabytes of memory. That’s about 268,288 times as much memory as the 1-gigabyte memory available to an iPhone 6.

Wow. I can't believe that sentence made it into an article, let alone one on IEEE Spectrum.

First, normal people don't know how much RAM is in a phone. Second, the numbers are reasonably identical (only differing by 2.4%) so what's the point of specifying it out that way? Why not figure out what an average amount of RAM in a new computer is (4GB?) and say "That's the RAM of about 70k new Desktops".

coderdude · on Jan 8, 2015

I think the people you're talking about who don't know how much RAM their phone has are the same people who don't know much RAM their desktop has. At least if you're using phones as a benchmark there is the chance a reader will be awed.

stavros · on Jan 8, 2015

"Unintuitive quantity 1 is over 268,288 times larger than unintuitive quantity 2!"

PhantomGremlin · on Jan 9, 2015

I was happy to see the comparison to an iPhone. At least it's from the 21st century.

For many years, everything was compared to "The Library of Congress". I got real tired of reading cliches like how some fiber link "can transfer the entire 2 million books of the library of congress in 10 seconds". Or how some new storage device can "store the entire library of congress". Comparisons like that never made any sense to me. Did they mean just the text characters? With or without compression? Surely images weren't included? Doesn't the library also have stuff like maps? Why weren't they included? Etc. :-)