Can you provide some citation for the claim that x86-64 (assuming something modern like AMD Zen or Intel (post) Skylake P-core) does page table walking/TLB-filling in microcode instead of the fairly obvious state machine that can walk as quickly as the cache hierarchy can deliver the table entries? Well, maybe give it a full cycle latency to process the response and decide the next step, though I don't remember there being any addition required to generate the address of the next level's page table entry so the bit of combinatorics to control the cache's read port might fit in the margins between the port's data out latches becoming valid and the address in latches's setup deadline.