[llvm-commits] CVS: llvm-www/pubs/2008-03-ASPLOS-HardErrorPropagation.html 2008-03-ASPLOS-HardErrorPropagation.pdf pubs.js

Duncan Sands baldrick at free.fr
Sun Jun 28 12:54:26 PDT 2009


Hi Chris,

> + This paper aims to provide such a characterization, resulting in identifying low-cost detection methods and providing guidelines for implementation of the recovery and diagnosis components of such a reliability solution. We focus on hard faults because they are increasingly important and have different system implications than the much studied transients. We achieve our goals through fault injection experiments with a microarchitecture-level full system timing simulator. Our main results are: (1) we are able to detect 95% of the unmasked faults in 7 out of 8 studied microarchitectural structures with simple detectors that incur zero to little hardware overhead; (2) over 86% of these detections are within latencies that existing hardware checkpointing schemes can handle, while others require software checkpointing; and (3) a surprisingly large fraction of the detected faults corrupt OS state, but almost all of these are detected with latencies short enough to use hardware 
c!
>  heckpointing, thereby enabling OS recovery in virtually all such cases.

another mysterious line break of the same kind.

Ciao,

Duncan.



More information about the llvm-commits mailing list