I would agree that the examples can be somewhat misleading, but not because of computability. Universal approximation theorem for neural networks doesn't care about computability, but it does require that the domain of the function is finite (or to be more precise, compact).
For example, suppose that the function to be approximated is simply f(x) = x. For any real numbers a < b we can produce an approximation that works well for a ≤ x ≤ b, but it cannot be a good approximation for all real numbers x. (The sigmoid in the hidden layer means that every approximation has to be a bounded function, and a bounded function cannot be a good global approximation of f(x) = x.)
Therefore, the translation example works if we assume that the number of different Chinese texts is finite, but otherwise nothing is guaranteed.
For example, suppose that the function to be approximated is simply f(x) = x. For any real numbers a < b we can produce an approximation that works well for a ≤ x ≤ b, but it cannot be a good approximation for all real numbers x. (The sigmoid in the hidden layer means that every approximation has to be a bounded function, and a bounded function cannot be a good global approximation of f(x) = x.)
Therefore, the translation example works if we assume that the number of different Chinese texts is finite, but otherwise nothing is guaranteed.