[llvm-commits] CVS: llvm-www/pubs/2008-03-ASPLOS-HardErrorPropagation.html 2008-03-ASPLOS-HardErrorPropagation.pdf pubs.js

Chris Lattner sabre at nondot.org
Sat Jun 27 13:01:00 PDT 2009



Changes in directory llvm-www/pubs:

2008-03-ASPLOS-HardErrorPropagation.html added (r1.1)
2008-03-ASPLOS-HardErrorPropagation.pdf added (r1.1)
pubs.js updated: 1.33 -> 1.34
---
Log message:

add "Understanding the propagation of hard errors to software and implications for resilient system design" from ASPLOS'08


---
Diffs of the changes:  (+78 -0)

 2008-03-ASPLOS-HardErrorPropagation.html |   70 +++++++++++++++++++++++++++++++
 2008-03-ASPLOS-HardErrorPropagation.pdf  |    0 
 pubs.js                                  |    8 +++
 3 files changed, 78 insertions(+)


Index: llvm-www/pubs/2008-03-ASPLOS-HardErrorPropagation.html
diff -c /dev/null llvm-www/pubs/2008-03-ASPLOS-HardErrorPropagation.html:1.1
*** /dev/null	Sat Jun 27 15:00:50 2009
--- llvm-www/pubs/2008-03-ASPLOS-HardErrorPropagation.html	Sat Jun 27 15:00:40 2009
***************
*** 0 ****
--- 1,70 ----
+ <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+ <html>
+ <head>
+   <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+   <link rel="stylesheet" href="../llvm.css" type="text/css" media="screen">
+   <title>Understanding the Propagation of Hard Errors to Software and 
+ Implications for Resilient System Design</title>
+ </head>
+ <body>
+ 
+ <div class="pub_title">
+   Understanding the Propagation of Hard Errors to Software and 
+ Implications for Resilient System Design
+ </div>
+ <div class="pub_author">
+   Man-Lap Li, Pradeep Ramachandran, Swarup K. Sahoo, Sarita V. Adve, Vikram S. Adve, Yuanyuan Zhou
+ </div>
+ 
+ <h2>Abstract:</h2>
+ <blockquote>
+ With continued CMOS scaling, future shipped hardware will be increasingly vulnerable to in-the-field faults. To be broadly deployable, the hardware reliability solution must incur low overheads, precluding use of expensive redundancy. We explore a cooperative hardware-software solution that watches for anomalous software behavior to indicate the presence of hardware faults. Fundamental to such a solution is a characterization of how hardware faults indifferent microarchitectural structures of a modern processor propagate through the application and OS.<p>
+ 
+ This paper aims to provide such a characterization, resulting in identifying low-cost detection methods and providing guidelines for implementation of the recovery and diagnosis components of such a reliability solution. We focus on hard faults because they are increasingly important and have different system implications than the much studied transients. We achieve our goals through fault injection experiments with a microarchitecture-level full system timing simulator. Our main results are: (1) we are able to detect 95% of the unmasked faults in 7 out of 8 studied microarchitectural structures with simple detectors that incur zero to little hardware overhead; (2) over 86% of these detections are within latencies that existing hardware checkpointing schemes can handle, while others require software checkpointing; and (3) a surprisingly large fraction of the detected faults corrupt OS state, but almost all of these are detected with latencies short enough to use hardware c!
 heckpointing, thereby enabling OS recovery in virtually all such cases.
+ </blockquote>
+ 
+ <h2>Published:</h2>
+ <blockquote>
+   "Understanding the Propagation of Hard Errors to Software and 
+ Implications for Resilient System Design"
+   <br>
+   Man-Lap Li, Pradeep Ramachandran, Swarup K. Sahoo, Sarita V. Adve, Vikram S. Adve, Yuanyuan Zhou.
+   <br>
+ <i>
+ Proceedings of the 13th international conference on Architectural support for programming languages and operating systems (ASPLOS'08)
+ </i>, Seattle, WA, March 2008.
+ </blockquote>
+ <h2>Download:</h2>
+ <h3>Paper:</h3>
+ <ul>
+   <li><a href="2008-03-ASPLOS-HardErrorPropagation.pdf">
+   Understanding the Propagation of Hard Errors to Software and 
+ Implications for Resilient System Design
+   </a> (PDF)</li>
+ </ul>
+ 
+ <h2>BibTeX Entry:</h2>
+ <pre>
+ @inproceedings{1346315,
+  author = {Li, Man-Lap and Ramachandran, Pradeep and Sahoo, Swarup Kumar and Adve, Sarita V. and Adve, Vikram S. and Zhou, Yuanyuan},
+  title = {Understanding the propagation of hard errors to software and implications for resilient system design},
+  booktitle = {ASPLOS XIII: Proceedings of the 13th international conference on Architectural support for programming languages and operating systems},
+  year = {2008},
+  isbn = {978-1-59593-958-6},
+  pages = {265--276},
+  location = {Seattle, WA, USA},
+  doi = {http://doi.acm.org/10.1145/1346281.1346315},
+  publisher = {ACM},
+  address = {New York, NY, USA},
+  }
+ </pre>
+ 
+ <!-- *********************************************************************** -->
+ <hr>
+   <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
+   src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
+   <a href="http://validator.w3.org/check/referer"><img
+   src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!" /></a>
+ 
+ </body>
+ </html>


Index: llvm-www/pubs/2008-03-ASPLOS-HardErrorPropagation.pdf


Index: llvm-www/pubs/pubs.js
diff -u llvm-www/pubs/pubs.js:1.33 llvm-www/pubs/pubs.js:1.34
--- llvm-www/pubs/pubs.js:1.33	Sat Jun 27 14:54:19 2009
+++ llvm-www/pubs/pubs.js	Sat Jun 27 15:00:40 2009
@@ -281,6 +281,14 @@
    month: 5,
    year: 2008},
 
+  {url: "2008-03-ASPLOS-HardErrorPropagation.html",
+   title: "Understanding the Propagation of Hard Errors to Software and
+Implications for Resilient System Design",
+   author: "Man-Lap Li, Pradeep Ramachandran, Swarup K. Sahoo, Sarita V. Adve, Vikram S. Adve, Yuanyuan Zhou",
+   published: "Proc. of the 13th international conference on Architectural support for programming languages and operating systems (ASPLOS'08)",
+   month: 3,
+   year: 2008},
+
   {url: '2008-03-DATE-TLM_Estimation.html',
    title: 'Cycle-approximate Retargetable Performance Estimation at the Transaction Level',
    author: 'Y. Hwang, S. Abdi, and D. Gajski',






More information about the llvm-commits mailing list