Date of Completion
12-14-2018
Embargo Period
12-14-2018
Keywords
Multicore, Synchronization, Moving Compute to Data, explicit messaging, cache coherence
Major Advisor
Omer Khan
Associate Advisor
John Chandy
Associate Advisor
Marten van Dijk
Field of Study
Electrical Engineering
Degree
Doctor of Philosophy
Open Access
Open Access
Abstract
Single chip multicore processors are now prevalent and processors with hundreds of cores are being proposed and explored by both academia and industry. Shared memory cache coherence is the state-of-the-art technology for these processors to enable synchronization and communication between cores. However, since the synchronization of cores on shared data using hardware cache coherence suffers from instruction retries and cache line ping-pong overheads, it prevents performance scaling as core counts increase on a chip.
This thesis proposes to utilize a novel moving computation to data model (MC) to overcome this synchronization bottleneck in a 1000-cores scale shared memory multicore processor. The proposed MC model pins shared data to dedicated cores called service cores. The execution of critical code sections is explicitly requested from worker cores to be performed at the service cores. In this way, the cache line bouncing between cores is prevented, hence data locality optimization is enabled. The proposed MC model utilizes auxiliary in-hardware explicit messaging for the critical section requests to enable efficient fine-grained blocking and non-blocking communication between communicating cores. To show the effectiveness of the proposed model, workloads with wide range of synchronization requirements from graph analytics, machine learning and database domains are implemented. The proposed model is then prototyped and exhaustively evaluated on a 72 core machine, Tilera Tile-Gx72 multicore platform, as it incorporates in-hardware core-to-core messaging as an auxiliary capability to the shared memory cache coherence paradigm. Since the Tile-Gx72 machine includes only 72 cores, it is deployed for evaluation at 8 to 64 core count scale. For further analysis at higher core count, a simulated RISC-V multicore environment is built and utilized, and the performance and dynamic energy scaling advantages of the MC model is evaluated against various baseline synchronization models up to 1024 cores.
Recommended Citation
Dogan, Halit, "Accelerating Synchronization on Futuristic 1000-cores Multicore Processor with Moving Compute to Data Model" (2018). Doctoral Dissertations. 2026.
https://digitalcommons.lib.uconn.edu/dissertations/2026