Supercomputer Emulation For Evaluating Scheduling Algorithms
Abstract
Scheduling algorithms have a significant impact on the optimal
utilization of HPC facilities, yet the vast majority of the
research in this area is done using simulations. In working with
simulations, a great deal of factors that affect a real
scheduler, such as its scheduling processing time, communication
latencies and the scheduler intrinsic
implementation complexity are not considered. As a result,
despite theoretical improvements reported in several articles,
practically no new algorithms proposed have been implemented in
real schedulers, with HPC facilities still using the basic
first-come-first-served (FCFS) with Backfill policy scheduling
algorithm.
A better approach could be, therefore, the use of real schedulers
in an emulation environment to evaluate new algorithms.
This thesis investigates two related challenges in emulations:
computational cost and faithfulness of the results to real
scheduling environments.
It finds that the sampling, shrinking and shuffling of a trace
must be done carefully to keep the classical metrics invariant or
linear variant in relation to size and times of the original
workload. This is accomplished by the careful control of the
submission period and the consideration of drifts in the
submission period and trace duration.
This methodology can help researchers to better evaluate their
scheduling algorithms and help HPC administrators to optimize the
parameters of production schedulers.
In order to assess the proposed methodology, we evaluated both
the FCFS with Backfill and Suspend/Resume scheduling algorithms.
The results strongly suggest that Suspend/Resume leads to a
better utilization of a supercomputer when high priorities are
given to big jobs.
Description
Keywords
Citation
Collections
Source
Type
Book Title
Entity type
Access Statement
License Rights
Restricted until
Downloads
File
Description