[llvm-commits] CVS: llvm-www/pubs/2008-03-ASPLOS-HardErrorPropagation.html 2008-03-ASPLOS-HardErrorPropagation.pdf pubs.js

Chris Lattner clattner at apple.com
Sun Jun 28 13:48:18 PDT 2009


On Jun 28, 2009, at 12:54 PM, Duncan Sands wrote:

> Hi Chris,
>
>> + This paper aims to provide such a characterization, resulting in  
>> identifying low-cost detection methods and providing guidelines for  
>> implementation of the recovery and diagnosis components of such a  
>> reliability solution. We focus on hard faults because they are  
>> increasingly important and have different system implications than  
>> the much studied transients. We achieve our goals through fault  
>> injection experiments with a microarchitecture-level full system  
>> timing simulator. Our main results are: (1) we are able to detect  
>> 95% of the unmasked faults in 7 out of 8 studied microarchitectural  
>> structures with simple detectors that incur zero to little hardware  
>> overhead; (2) over 86% of these detections are within latencies  
>> that existing hardware checkpointing schemes can handle, while  
>> others require software checkpointing; and (3) a surprisingly large  
>> fraction of the detected faults corrupt OS state, but almost all of  
>> these are detected with latencies short enough to use hardware
> c!
>> heckpointing, thereby enabling OS recovery in virtually all such  
>> cases.
>
> another mysterious line break of the same kind.

Thanks, this doesn't manifest as a rendering or validation problem in  
the HTML, so I'll just leave it.

-Chris



More information about the llvm-commits mailing list