Each second of listening we're perceiving the speaker's identity, what accent they are using, how fast they are talking, and what emotions they are showing. Those should count for the bit rate dealt with by the conscious brain.
Again: perception is not what we're talking about and the paper acknowledges that perceptive input is orders of magnitude larger. I challenge you to listen comprehensively to someone talking about a topic you don't know while identifying someone in a police lineup.