GPU-Centric Communication on NVIDIA GPU Clusters with InfiniBand: A Case Study with OpenSHMEM... Conference Paper December, 2017
Failures in Large Scale Systems: Long-term Measurement, Analysis, and Implications Conference Paper November, 2017
Characterizing Temperature, Power, and Soft-Error Behaviors in Data Center Systems: Insights, Challenges, and Opportunities Conference Paper November, 2017
An evaluation of the state of time synchronization on leadership class supercomputers Journal October, 2017
Resilience Design Patterns: A Structured Approach to Resilience at Extreme Scale Journal September, 2017
SharP Hash: A High-Performing Distributed Hash for Extreme-Scale Systems Conference Paper September, 2017
Big Data Meets HPC Log Analytics: Scalable Approach to Understanding Systems at Extreme Scale Conference Paper September, 2017
SharP: Towards Programming Extreme-Scale Systems with Hierarchical Heterogeneous Memory Conference Paper August, 2017
Efficient Breadth First Search on Multi-GPU Systems using GPU-centric OpenSHMEM Conference Paper August, 2017