Don Maxwell Group Leader Contact MAXWELLDE@ORNL.GOV All Publications 2022 Operational Assessment - OLCF Utilizing IBM Spectrum LSF Simulator to Understand the Impacts of Adding AI Workloads to Capability Supercomputing 2021 Operational Assessment - OLCF GPU Lifetimes on Titan Supercomputer: Survival Analysis and Reliability US Department of Energy, Office of Science High Performance Computing Facility Operational Assessment 2019 Oak Ridge Leadership Computing Facility Analyzing a Five-Year Failure Record of a Leadership-Class Supercomputer Scaling the Summit: Deploying the World's Fastest Supercomputer Gpu Age-aware Scheduling to Improve the Reliability of Leadership Jobs on Titan The Design, Deployment, and Evaluation of the CORAL Pre-Exascale Systems Are we witnessing the spectre of an HPC meltdown? Are We Witnessing the Spectre of an HPC Meltdown? Experiences evaluating functionality and performance of IBM Power8+ systems Reliability Lessons Learned From GPU Experience With The Titan Supercomputer at Oak Ridge Leadership Computing Facility Understanding and Exploiting Spatial Properties of System Failures on Extreme-Scale HPC Systems... Experience with GPUs on the Titan Supercomputer from a Reliability, Performance and Power Perspective Analyzing the Interplay of Failures and Workload on a Leadership-Class Supercomputer Understanding GPU Errors on Large-scale HPC Systems and the Implications for System Design and Operation... Monitoring Cray Cooling Systems I/O Router Placement and Fine-Grained Routing on Titan to Support Spider II... TUE, A New Energy-Efficiency Metric Applied at ORNL's Jaguar... Production Experiences with the Cray-Enabled TORQUE Resource Manager Contemporary High Performance Computing From Petascale toward Exascale... Online Diagnostics at Scale The NCRC Grid Scheduling Environment... Memphis on an XT5: Pinpointing Memory Performance Problems on Cray Platforms Pagination Current page 1 Page 2 Next page ›â¶Äº Last page Last » Key Links Organizations Computing and Computational Sciences Directorate National Center for Computational Sciences HPC Systems Section HPC Scalable Systems Group