Abstract
The Leadership Computing Facility (LCF) at 91做厙 (ORNL) has a diverse portfolio
of computational resources ranging from a petascale XT4/XT5 simulation system (Jaguar) to numerous other
systems supporting development, visualization, and data analytics. In order to support vastly different I/O
needs of these systems Spider, a Lustre-based center wide file system was designed and deployed to provide
over 240 GB/s of aggregate throughput with over 10 Petabytes of formatted capacity. A multi-stage InfiniBand
network, dubbed as Scalable I/O Network (SION), with over 889 GB/s of bisectional bandwidth was deployed
as part of Spider to provide connectivity to our simulation, development, visualization, and other platforms.
To our knowledge, while writing this paper, Spider is the largest and fastest POSIX-compliant parallel file
system in production. This paper will detail the overall architecture of the Spider system, challenges in
deploying and initial testings of a file system of this scale, and novel solutions to these challenges which offer
key insights into file system design in the future.