Micro-threading for multi/many-cores architectures

Date of Completion

January 2009


Computer Science




Multi-core processors are becoming omnipresent in all kinds of computing platforms. Applications developers have now to consider parallel applications to reach proper speed-up in their applications. At the other end, as microprocessors designers are hitting the walls of memory latency, heat dissipation, and number of transistors for single core processors, they have to introduce more cores on the same chip to continue making faster micro-processors and maintain Moore's law. As these cores increase in number, applications' parallelism will be more fine-grained. One application may have hundreds or even thousands of homogenous or heterogeneous threads working simultaneously on the same micro-processor. Such emerging programming model of massively parallel applications executing on multi/many-cores architectures pops to the surface three main challenges: (1) Exploitation of hardware and software parallelization capabilities at different levels to hide memory latency, (2) Utilization of the on-chip communication capabilities to effectively synchronize shared data elements, and (3) Difficulty of programming and controlling many threads working concurrently.^ To tackle these three challenges, we proposed the micro-threading framework for multi and many-cores microprocessors architectures. We add another level of parallelization utilizing multi- and many-cores architectural aspects, such as explicit cache management, cores interconnection network, and cores heterogeneity. We are utilizing the Cell Broadband Engine as one of the leading heterogeneous multi-core processors to implement and experiment our micro-threading framework. Our implementation and measures show good performance improvements on the Cell Broadband Engine architecture using both combinatorial and scientific algorithms. We employ micro-threads framework to implement the proposed on-chip synchronization protocol and events driven programming model for high performance and parallel computing on multi-core processors.^