This 500MB/s a merged stream when using all cores, I got like 140MB/s on one (from my answer in the SO thread you linked).
And I'm not ranting, I'm just crying over a lost opportunity. Nowadays you must spend time to think which PRNG to use and how to implement it to satisfy some quality/speed trade-off; an RNG (infinite cycle by design) directly connected to CPU (no transfer bottlenecks) that passes Die Hard (read random enough for science) would be a golden bullet.
Yes, PRNG makes RDRAND faster than its entropy source in its current design; but it is not hitting any wall. Intel engineers could made it way faster if they had focused on maximal throughput possible not just big enough for crypto.
I'm still surprised that poses a challenge for your applications. I thought fast non-cryptographic RNGs were a solved problem. How much random data do you generate? If you use a significant amount of CPU just for that I doubt it would be feasible for Intel to build a cryptographically secure RNG with the same throughput without significant extra costs (think one extra core). (I'm no expert on that subject though.)
And I'm not ranting, I'm just crying over a lost opportunity. Nowadays you must spend time to think which PRNG to use and how to implement it to satisfy some quality/speed trade-off; an RNG (infinite cycle by design) directly connected to CPU (no transfer bottlenecks) that passes Die Hard (read random enough for science) would be a golden bullet.
Yes, PRNG makes RDRAND faster than its entropy source in its current design; but it is not hitting any wall. Intel engineers could made it way faster if they had focused on maximal throughput possible not just big enough for crypto.