Date of Completion
8-18-2017
Embargo Period
8-23-2017
Major Advisor
Omer Khan
Associate Advisor
John Chandy
Associate Advisor
Marten Van Dijk
Field of Study
Electrical Engineering
Degree
Doctor of Philosophy
Open Access
Open Access
Abstract
The ever-increasing miniaturization of semiconductors has led to important advances in mobile, cloud and network computing. However, it has caused electronic devices to become less reliable and microprocessors more susceptible to transient faults induced by radiations. These intermittent faults do not provoke permanent damage, but may result in incorrect execution of programs by altering signal transfers or stored values. These transitory faults are also called soft errors. As technology scales, researchers and industry pundits are projecting that soft-error problems will become increasingly important. Today’s processors implement multicores, featuring diverse set of compute cores and on-board memory sub-systems connected via networks-on-chip and communication protocols. Such multicores are widely deployed in numerous environments for their computational capabilities.
To protect multicores from soft-error perturbations, resiliency schemes have been developed with high coverage but high power and performance overheads. It is observed that not all soft- errors affect program correctness, some soft-errors only affect program accuracy, i.e., the program completes with certain acceptable deviations from error free outcome. Thus, it is practical to improve processor efficiency by trading off resiliency overheads with program accuracy. This thesis explains the idea of declarative resilience that selectively applies resiliency schemes to both crucial and non-crucial code. At the application level, crucial and non-crucial code is identified based on its impact on the program outcome. A cross-layer architecture is developed, through which hardware collaborates with software support to enable efficient resilience with holistic soft- error coverage. Only program accuracy is compromised in the worst-case scenario of a soft-error strike during non-crucial code execution.
Recommended Citation
Shi, Qingchuan, "A Cross-Layer Resilient Multicore Architecture" (2017). Doctoral Dissertations. 1591.
https://digitalcommons.lib.uconn.edu/dissertations/1591