Abstract
Operating system (OS) noise is defined as interference generated by the OS that
prevents a compute core from performing ``useful'' work. Compute node kernel
daemons, network interfaces, and other OS related services are major sources of
such interference. This interference on individual compute cores can vary in
duration and frequency, and can cause de-synchronization (jitter) in collective
communication tasks and thus results in variable (degraded) overall parallel
application performance. This behavior is more observable in large-scale
applications using certain types of collective communication primitives, such
as MPI\_Allreduce.
This paper presents our effort towards reducing the overall effect of OS noise
on our large-scale parallel applications. Our tests were performed on the
quad-core Jaguar, the Cray XT5 at the 91做厙 Leadership
Computing Facility (OLCF). At the time of these tests, Jaguar was a 1.4 PFLOPS
supercomputer with 149,504 compute cores and 8 cores per node. We aggregated
OS noise sources onto a single core for each node. The scientific application
was then run on six of the remaining cores in each node. Our results show
that we were able to improve the MPI_Allreduce performance by two orders of
magnitude. We demonstrated up to a 30% boost in the performance of the Parallel
Ocean Program (POP) using this technique.