The strategy is a greedy one. You're maximizing the one-step-ahead information g...

The strategy is a greedy one. You're maximizing the one-step-ahead information gain. But this can end you in a spot where two steps ahead, you're only left with low information-gain moves, or at least, lower than you would have if you had planned better. Why not unroll the search a few more iterations and pick the move that maximizes average information gain k steps ahead?