The UAT states that depth is fundamentally not important, at least theoretically...

The UAT states that depth is fundamentally not important, at least theoretically. It only has immense practical uses. So adding an intermediate linear layer + some nonlinearity already gets you an error scaling like O(1/N) for width N (in theory), regardless of what you are actually mapping. At least as long as it's somewhat continuous.