Real-life tactics unfortunately don't make for good heuristics in the Monte Carl...

aidenn0 · on April 19, 2012

Choosing a number with a MT is not faster than adding two numbers (which can typically be done in a single cycle).

bevan · on April 19, 2012

Yeah that might have been an exaggeration. But as I recall, once you have the MT seeded and fired up, getting a new number requires little more than a bit shift. Anyway it was substantially faster than simple alternatives I tried (including other PRNGs). One early theory I had was that using a faster PRNG than MT could lead to a better result even if it wasn't as evenly distributed. I think MT proved to be the best solution though.

Natsu · on April 19, 2012

They were probably generating tons of them at once so the only operation was grabbing a number out of memory. But you're right that it takes a whole lot of operations to generate the numbers in the first place.

bevan · on April 19, 2012

To further illustrate the point, I think I may have actually tried a no-capture heuristic at one point. I wouldn't be too surprised if it also worked better than random.

bevan · on April 19, 2012

Correction: "most non-random tactics" wouldn't make it better, as someone else pointed out. Finding ones that do is a tough question, beyond just runtime; keep in mind that the MC player uses that strategy against itself over entire random games, which can lead to really un-humanlike games. Completely random games at least ensure that you're hitting all possibilities (and when that good move is found, that path is explored more).

6ren · on April 19, 2012

Can I ask you a question? Is the random move approach simply brute force; or does it turn out that, weirdly, moves which have good outcomes from the approach also turn out to have good outcomes in general, even though the actual sequence of moves weren't actually considered?

i.e. I guess I'm just asking if it really is a heuristic.

bevan · on April 19, 2012

The Monte Carlo simulator is a very simple and weak heuristic by itself. In UCT go players it's inseparable from the tree-traversal/expansion algorithm, which uses data propagated up the tree from the MC-tested leaf nodes to decide which move sequence to explore next.

When it's expanding the search tree, every node expansion is arrived at "intelligently". It starts at the root node, and at each branching, asking itself: which following move is the best mixture of good (based on MC games I've played on the leaf nodes of its children) and unknown (ie I haven't explored that branch very much yet). It does that at every node until the leaf, where it expands a node with the MC-simulator (and then propagates the result up the tree). So unlike some other tree-search algorithms, it isn't cobbled by having to expand all paths to a certain depth before exploring a particular sequence deeper. The result literally looks like a tree: some branches are really tall, maybe 18 moves deep, while others are really short. Even though the "heuristic" is MC, I don't like calling it brute force, because it uses the results of those MC games so well.

6ren · on April 19, 2012

Thanks, I think I see the answer now. Let me summarize to check: There's a conceptual tree of all possible games, with a move at each node. You run some random games to completion, giving sequences of moves from the root (starting position) to a leaf (game over). Then, from the current move, you explore the best of the "next" moves more thoroughly - by doing the same thing again, starting from each of those "next" moves. The benefit of running random games to completion is it gives (some) long-range vision.

I wondered whether those (relatively) sparse "tracer bullets" actually find whole "good" sections of the tree - or whether it's just the specific leaf nodes it found that were good. The answer is that, since it's being used to guide exploration, it must be good - there'd be no point using it as a heuristic otherwise. So, the existence of a win in a region of the tree is a partial predictor of other wins in that region.

I read up on the rules of Go after asking, and it makes sense: once you own some territory, you'll tend to keep it - so that if a sequence of moves leads to a win, all the prefixes of those moves will also be strong positions along the way.

So it's not brute force, but a predictive heuristic. Still kinda surprising that it works well. :-)