Efficient Workload Balancing on Heterogeneous GPUs using Mixed-Integer Non-Linear Programming

Lin, Chih-Sheng; Hsieh, Chih-Wei; Chang, Hsi-Ya; Hsiung, Pao-Ann

Services on Demand

Journal

Article

Indicators

Cited by SciELO
Access statistics

Journal of applied research and technology

On-line version ISSN 2448-6736Print version ISSN 1665-6423

J. appl. res. technol vol.12 n.6 Ciudad de México Dec. 2014

Efficient Workload Balancing on Heterogeneous GPUs using Mixed-Integer Non-Linear Programming

Chih-Sheng Lin¹, Chih-Wei Hsieh², Hsi-Ya Chang² and Pao-Ann Hsiung*¹

¹ Department of Computer Science and Information Engineering, National Chung Cheng University, Chaiyi, Taiwan. *pahsiung@cs.ccu.edu.tw

² National Center for High-Performance Computing, Hsinchu, Taiwan.

Abstract

Recently, heterogeneous system architectures are becoming mainstream for achieving high performance and power efficiency. In particular, many-core graphics processing units (GPUs) now play an important role for computing in heterogeneous architectures. However, for application designers, computational workload still needs to be distributed to heterogeneous GPUs manually and remains inefficient. In this paper, we propose a mixed integer non-linear programming (MINLP) based method for efficient workload distribution on heterogeneous GPUs by considering asymmetric capabilities of GPUs for various applications. Compared to the previous methods, the experimental results show that our proposed method improves performance and balance up to 33% and 116%, respectively. Moreover, our method only requires a few overhead while achieving high performance and load balancing.

Keywords: Computational workload distribution, graphic processing units (GPUs), load balancing, mixed-integer nonlinear programming (MINLP).

DESCARGAR ARTÍCULO EN FORMATO PDF

References

[1] G. A. Laguna-Sánchez et al., "Comparative Study of Parallel Variants for a Particle Swarm Optimization Algorithm Implemented on a Multithreaded GPU," Journal of Applied of Research and Technology, vol. 7, no. 3, pp. 292-309, 2009. [ Links ]

[2] J. C. Cuevas-Tello et al., "Parallel Approach for Time Series Analysis with General Regression Neural Networks," Journal of Applied of Research and Technology, vol. 10, no. 2, pp. 162-179, 2012. [ Links ]

[3] Top 500. Available from: http://www.top500.org

[4] Nvidia Tegra. Available from: http://www.nvidia.com/object/tegra.html

[5] NVIDIA Corporation, NVIDIA CUDA Programming Guide, 2009. [ Links ]

[6] J. E. Stone et al., "OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems, "Computing in Science and Engineering, vol. 12, no. 3, pp. 66-73, 2010. [ Links ]

[7] Microsoft, DirectCompute. Available form: http://www.microsoft.com/enus/download/details.aspx?id=27731 [ Links ]

[8] C.-K. Luk et al., "Qilin: Exploiting Parallelism on Heterogeneous Multiprocessors with Adaptive Mapping," in the 42nd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-42, pp. 45-55, 2009. [ Links ]

[9] W. Liu et al., "A Waterfall Model to Achieve Energy Efficient Tasks Mapping for Large Scale GPU Clusters," in the IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, IPDPSW, pp. 82-92, 2011. [ Links ]

[10] V. J. Jimenez et al, "Predictive runtime code scheduling for heterogeneous architectures," in the 4th International Conference on High Performance Embedded Architectures and Compilers, HiPEAC, pp. 19-33, 2009. [ Links ]

[11] S. Ghiasi et al., "Scheduling for heterogeneous processors in server systems," in the 2nd Conference on Computing Frontiers, pp. 199-210, 2005. [ Links ]

[12] A. P. D. Binotto et al., "Towards dynamic reconfigurable load-balancing for hybrid desktop platforms," in the IEEE International Symposium on Parallel & Distributed Processing Workshops and Phd Forum, IPDPSW, pp. 1-4, 2010. [ Links ]

[13] I. Galindo et al., "Dynamic load balancing on dedicated heterogeneous systems," In Recent Advances in Parallel Virtual Machine and Message Passing Interface, Springer, pp. 64-74, 2008. [ Links ]

[14] C. Augonnet et al., "StarPU: a unified platform for task scheduling on heterogeneous multicore architectures," Concurrency and Computation: Practice and Experience, pp. 187-198, 2011. [ Links ]

[15] D. Clarke et al., "Dynamic load balancing of parallel computational iterative routines on highly heterogeneous HPC platforms," Parallel Processing Letters, pp. 195-217, 2011. [ Links ]

[16] NVIDIA. CUDA CUBLAS Reference Manual, June 2007. [ Links ]

[17] S. Che et al., "Rodinia: A benchmark suite for heterogeneous computing," In the IEEE International Symposium on Workload Characterization, IISWC, pp. 44-54, 2009. [ Links ]

[18] Nvidia, GPU computing SDK. Available from: https://developer.nvidia.com/gpu-computing-sdk. [ Links ]