I don't think a 13x13 tiling (of N channels/features) can be ruled out just because it can't recognize a grid of 13x13 objects. There is presumably a lot of overlap between the receptive fields of the tiles (due to kernel step sizes).
A pyramid of overlapped tiling resolutions is of course possible too.
A pyramid of overlapped tiling resolutions is of course possible too.