Because the signal transmitted over normal phones has to be encrypted. That encrypted signal will then be digitized/compressed by the standard phone line. Any artifacts in the phone line digitization might turn the encrypted signal into gibberish. Its like compressing a jpeg too many times. So you need an encryption method that isnt simple digitization. You need something that is encrypted but essentially sounds like human speech so that the digitization/compression process does not damage it.
They were, they got compressed with G.711 or G.722.
In fact, that's why your 56kbps modem would often fall back to 38.4kbps or 28k8, until the phone company installed a fancy new exchange that demodulated the 56kbps stream and didn't compress it. The 56kbps was also due to sampling limits/bandlimiters, on the same copper line you could also get a fully digital ISDN line that did 64kbps. (And if they remove all the filters and band limits, you can reach DSL speeds.)
There's nothing inherently special about voice-compression compared to any other kind of interference/distortion you can get on an analogue line.
But that same re-compression happens with modem traffic. Your 56k modems deal with compression artifacts just fine, though sometimes dropping down to lower speeds.
https://gdmissionsystems.com/products/encryption/secure-voic...
https://www.cryptomuseum.com/crypto/gd/viper/