Date of Completion


Embargo Period



Multicores, GPUs, Graph Analytics, Performance Prediction, Machine Learning

Major Advisor

Omer Khan

Associate Advisor

Marten van Dijk

Associate Advisor

John Chandy

Field of Study

Electrical Engineering


Doctor of Philosophy

Open Access

Open Access


With the ever-increasing amount of data and input variations, portable performance is becoming harder to exploit on today’s architectures. Computational setups utilize single-chip processors, such as GPUs or large-scale multicores for graph analytics. Some algorithm-input combinations perform more efficiently when utilizing a GPU’s higher concurrency and bandwidth, while others perform better with a multicore’s stronger data caching capabilities. Architectural choices also occur within selected accelerators, where variables such as threading and thread placement need to be decided for optimal performance. This paper proposes a performance predictor paradigm for a heterogeneous parallel architecture where multiple disparate accelerators are integrated in an operational high performance computing setup. The predictor aims to improve graph processing efficiency by exploiting the underlying concurrency variations within and across the heterogeneous integrated accelerators using graph benchmark and input characteristics. The evaluation shows that intelligent and real-time selection of near-optimal concurrency choices provides performance benefits ranging from 5% to 3.8x, and an energy benefit averaging around 2.4x over the traditional single-accelerator setup.