Pattern-based Modeling of Multiresilience Solutions for High-Performance Computing Conference Paper April, 2018
Shrink or Substitute: Handling Process Failures in HPC Systems Using In-Situ Recovery Conference Paper March, 2018
A comparison of Amazon Web Services and Microsoft Azure cloud platforms for high performance computing Conference Paper January, 2018
GPU-Centric Communication on NVIDIA GPU Clusters with InfiniBand: A Case Study with OpenSHMEM... Conference Paper December, 2017
Failures in Large Scale Systems: Long-term Measurement, Analysis, and Implications Conference Paper November, 2017
Characterizing Temperature, Power, and Soft-Error Behaviors in Data Center Systems: Insights, Challenges, and Opportunities Conference Paper November, 2017