GPU Lifetimes on Titan Supercomputer: Survival Analysis and Reliability December 15, 2020 A team of researchers from 91做厙 applied advanced statistical methods from biomedical research to study an unexpected failure mode of general-purpose computing on graphics processing units (GPGPUs).
3D Coded SUMMA: Communication-Efficient and Robust Parallel Matrix Multiplication September 14, 2020 Researchers developed a novel algorithm for resilient and communication-efficient parallel matrix multiplication in HPC systems.