Real-life tactics unfortunately don't make for good heuristics in the Monte Carlo simulator.
Our goal was to improve overall gameplay by making the Monte Carlo simulator play slightly more realistically than random. Keep in mind that when the Monte Carlo simulator plays completely random games, it still does pretty good. I believe it's on par with GnuGo, a good pre-UCT go player, on a 9x9 board on a modern quad-core.
Whether a heuristic applied within the MC-simulator generates sample games that better assess the strength of a move is a tricky question to ponder. But it seems reasonable that most non-random tactics, however simple, would accomplish that (such as proximity- or capture-based tactics).
The reason that good real-life tactics do not correlate with improved gameplay when implemented in MC-simulators is because the logic required for even simple tactics can reduce the number of semi-random games the computer can play by several orders of magnitude. Picking a random move is extremely quick (choosing a number with a Mersenne Twister is faster than adding two numbers- compare that to the operations required to determine how many pieces surround a given position). That's partly why very simple heuristics like capture-if-possible are only effective if they're applied on a few (2, in our case) open board positions per move in a MC-simulated game. So its just slightly non-random.
Anyway I hope that made sense. It's a weird problem to think about at first, but it's actually really fun to play around with. I recommend checking out Drake's implementation.
Yeah that might have been an exaggeration. But as I recall, once you have the MT seeded and fired up, getting a new number requires little more than a bit shift. Anyway it was substantially faster than simple alternatives I tried (including other PRNGs). One early theory I had was that using a faster PRNG than MT could lead to a better result even if it wasn't as evenly distributed. I think MT proved to be the best solution though.
They were probably generating tons of them at once so the only operation was grabbing a number out of memory. But you're right that it takes a whole lot of operations to generate the numbers in the first place.
To further illustrate the point, I think I may have actually tried a no-capture heuristic at one point. I wouldn't be too surprised if it also worked better than random.
Correction: "most non-random tactics" wouldn't make it better, as someone else pointed out. Finding ones that do is a tough question, beyond just runtime; keep in mind that the MC player uses that strategy against itself over entire random games, which can lead to really un-humanlike games. Completely random games at least ensure that you're hitting all possibilities (and when that good move is found, that path is explored more).
Can I ask you a question? Is the random move approach simply brute force; or does it turn out that, weirdly, moves which have good outcomes from the approach also turn out to have good outcomes in general, even though the actual sequence of moves weren't actually considered?
i.e. I guess I'm just asking if it really is a heuristic.
The Monte Carlo simulator is a very simple and weak heuristic by itself. In UCT go players it's inseparable from the tree-traversal/expansion algorithm, which uses data propagated up the tree from the MC-tested leaf nodes to decide which move sequence to explore next.
When it's expanding the search tree, every node expansion is arrived at "intelligently". It starts at the root node, and at each branching, asking itself: which following move is the best mixture of good (based on MC games I've played on the leaf nodes of its children) and unknown (ie I haven't explored that branch very much yet). It does that at every node until the leaf, where it expands a node with the MC-simulator (and then propagates the result up the tree). So unlike some other tree-search algorithms, it isn't cobbled by having to expand all paths to a certain depth before exploring a particular sequence deeper. The result literally looks like a tree: some branches are really tall, maybe 18 moves deep, while others are really short. Even though the "heuristic" is MC, I don't like calling it brute force, because it uses the results of those MC games so well.
Thanks, I think I see the answer now. Let me summarize to check:
There's a conceptual tree of all possible games, with a move at each node. You run some random games to completion, giving sequences of moves from the root (starting position) to a leaf (game over). Then, from the current move, you explore the best of the "next" moves more thoroughly - by doing the same thing again, starting from each of those "next" moves. The benefit of running random games to completion is it gives (some) long-range vision.
I wondered whether those (relatively) sparse "tracer bullets" actually find whole "good" sections of the tree - or whether it's just the specific leaf nodes it found that were good.
The answer is that, since it's being used to guide exploration, it must be good - there'd be no point using it as a heuristic otherwise. So, the existence of a win in a region of the tree is a partial predictor of other wins in that region.
I read up on the rules of Go after asking, and it makes sense: once you own some territory, you'll tend to keep it - so that if a sequence of moves leads to a win, all the prefixes of those moves will also be strong positions along the way.
So it's not brute force, but a predictive heuristic. Still kinda surprising that it works well. :-)
Our goal was to improve overall gameplay by making the Monte Carlo simulator play slightly more realistically than random. Keep in mind that when the Monte Carlo simulator plays completely random games, it still does pretty good. I believe it's on par with GnuGo, a good pre-UCT go player, on a 9x9 board on a modern quad-core.
Whether a heuristic applied within the MC-simulator generates sample games that better assess the strength of a move is a tricky question to ponder. But it seems reasonable that most non-random tactics, however simple, would accomplish that (such as proximity- or capture-based tactics).
The reason that good real-life tactics do not correlate with improved gameplay when implemented in MC-simulators is because the logic required for even simple tactics can reduce the number of semi-random games the computer can play by several orders of magnitude. Picking a random move is extremely quick (choosing a number with a Mersenne Twister is faster than adding two numbers- compare that to the operations required to determine how many pieces surround a given position). That's partly why very simple heuristics like capture-if-possible are only effective if they're applied on a few (2, in our case) open board positions per move in a MC-simulated game. So its just slightly non-random.
Anyway I hope that made sense. It's a weird problem to think about at first, but it's actually really fun to play around with. I recommend checking out Drake's implementation.