I have worked in HPC (academia) where the cluster storage size is measured in multiples of PB since a decade. Since latency and bandwidth is a killer requirement there, Infiniband (instead of Ethernet) is the defacto standard for connecting the storage pools to the computing nodes.
Maintaining such a (storage) cluster requires 1-2 people on site which replace a few hard disks every day.
Nevertheless, when I would continously need massive amount of data, I would opt in doing it myself anytime instead of cloud services. I just know how well these clusters run and there is little to no saving when outsourcing it.
I am a researcher in academia that handles most of my system admin needs myself. It’s way cheaper to do yourself than some of these comments here make it sound (if you have good server rack space available). I ordered two 60 drive JBODs that I racked by myself (I removed all the drives first to lighten them) for ~82k. I used Zfs and 10 drive raidz2 vdevs for a total capacity of ~960TB of useable file system space. Installing the servers and testing some setups and putting it into use took about 4-5 days. In four years I’ve put many PBs of reAfs and writes through these and had to replace 3 drives. I’d estimate I spend about 2% of my active work focus on maintaining and troubleshooting it. Scaling up to 10PB I’d probably switch to a supported SDS solution, which would be much more expensive, but still way way cheaper than cloud.
Since he needs 1000ms response on storage isn't ethernet the better option? It can reach 400gb/s on fastest hardware now. I thought Infiniband was only reasonable to use when machines need to quickly access other machines primary memory. I would like to know if I'm wrong about this though.
Agreed and at this point with ROCE there's little reason to go with infiniband given you can find fast ethernet hardware that'll go toe to toe with infiniband on latency and throughput.
I've done multiple multipetabyte scale projects and you only need to swap disks once a month or so. I had a project (as a solo engineer) 2 hours away and I drove there once in six months.
Maintaining such a (storage) cluster requires 1-2 people on site which replace a few hard disks every day.
Nevertheless, when I would continously need massive amount of data, I would opt in doing it myself anytime instead of cloud services. I just know how well these clusters run and there is little to no saving when outsourcing it.