Silent Data Corruptions at Scale https://arxiv.org/abs/2102.11245
Harish Dixit has two other papers available https://arxiv.org/search/cs?searchtype=author&query=Dixit%2C...
Revisiting Memory Errors in Large-Scale Production Data Centers: Analysis and Modeling of New Trends from the Field https://users.ece.cmu.edu/~omutlu/pub/memory-errors-at-faceb...
Justin Meza has a similar body of research. https://www.semanticscholar.org/author/Justin-Meza/144606145
https://kilthub.cmu.edu/articles/thesis/Large_Scale_Studies_...
Silent Data Corruptions at Scale https://arxiv.org/abs/2102.11245
Harish Dixit has two other papers available https://arxiv.org/search/cs?searchtype=author&query=Dixit%2C...
Revisiting Memory Errors in Large-Scale Production Data Centers: Analysis and Modeling of New Trends from the Field https://users.ece.cmu.edu/~omutlu/pub/memory-errors-at-faceb...
Justin Meza has a similar body of research. https://www.semanticscholar.org/author/Justin-Meza/144606145