Skip to:

Publication Abstract

Towards the Robustness of Dynamic Loop Scheduling on Large-scale Heterogeneous Distributed Systems

Banicescu, I., Ciorba, F. M., & Carino, R.L. (2009). Towards the Robustness of Dynamic Loop Scheduling on Large-scale Heterogeneous Distributed Systems. In TBD (Ed.), Proceedings of the 8th International Symposium on Parallel and Distributed Computing. Lisbon, Portugal: IEEE Computer Society Press.

Dynamic loop scheduling (DLS) algorithms provide application-level load balancing of loop iterates, with the goal of maximizing application performance on the underlying system. These methods use run-time information regarding the performance of the application\'s execution (for which irregularities change over time). Many DLS methods are based on probabilistic analyses, and therefore account for unpredictable variations of application and system related parameters. Scheduling scientific and engineering applications in large-scale distributed systems (possibly shared with other users) makes the problem of DLS even more challenging. Moreover, the chances of failure, such as processor or link failure, are high in such large-scale systems. In this paper, we employ the hierarchical approach for three DLS methods, and propose metrics for quantifying their robustness with respect to variations of two parameters (load and processor failures), for scheduling irregular applications in large-scale heterogeneous distributed systems.