For one, only continuous functions can be represented.
Much more importantly, the theorem doesn't prove that it's possible to learn the necessary weights to approximate any function, just that such weights much exist.
With our current methods, only a subset of all possible NNs are actually trainable, so we can only automate the construcion approximations for certain continuous functions (generally those that are differentiable, but there may be exceptions, I'm not as sure).
If we're talking about approximation, continuous functions can converge to step functions just fine. Take a regular sigmoid and keep raising the weight to see one example. That's a good point about training though, that theorem doesn't fully explain why NNs work, although it somewhat sounds like it does.
There are discontinuous functions which are not steps - one of the more useful ones being tan(x) for any real x. Of course, since tan(x) is piece wise continuous and periodic, it is probably easy to work around in practice.
For one, only continuous functions can be represented.
Much more importantly, the theorem doesn't prove that it's possible to learn the necessary weights to approximate any function, just that such weights much exist.
With our current methods, only a subset of all possible NNs are actually trainable, so we can only automate the construcion approximations for certain continuous functions (generally those that are differentiable, but there may be exceptions, I'm not as sure).