[PATCH] D12318: [ASan] Enable optional ASan recovery

Fri Oct 30 11:40:59 PDT 2015

ygribov added a comment.

Sorry, I got distracted by other projects. I'm mostly done with all remarks with just one issue (can't commonize halt_on_error).

================
Comment at: lib/asan/asan_flags.inc:141
@@ +140,3 @@
+          "WARNING: USE AT YOUR OWN RISK!)")
+ASAN_FLAG(uptr, max_errors, 10,
+          "Maximum number of errors reported if halting-on-error is disabled")
----------------
ygribov wrote:
> kcc wrote:
> > No, just add it to sanitizer_common and write something like ("currently supported by asan") in the description. 
> > 
> > If we add things that are potentially useful to other sanitizers to one of them (not to sanitizer_common) 
> > we then end up having duplicates in all sanitizers. 
> > 
> > If we add things that are potentially useful to other sanitizers to one of them (not to sanitizer_common) 
> > we then end up having duplicates in all sanitizers.
> 
> That makes sense, we already have too many semi-duplicate flags.
Here's the problem - we need two halt_on_error's for ASan and integrated UBSan. Another problem is that msan does some local magic to alias current halt_on_error to keep_going flag.

================
Comment at: lib/asan/asan_report.cc:626
@@ +625,3 @@
+                               bool fatal = false) {
+    halt_on_error = fatal || flags()->halt_on_error;
+
----------------
kcc wrote:
> ygribov wrote:
> > kcc wrote:
> > > This is all exceptionally horrible.
> > > I've spent quite a bit of time and failed to understand how it works, 
> > > also I don't understand how to test it. 
> > > Why not just grab a lock for the entire duration of error reporting? 
> > > 
> > > The current logic was there because we never reported more than one bug, 
> > > now that we do -- simply grab a lock! 
> > > also I don't understand how to test it. 
> > 
> > Did you ever understand how to test original code :) ? The behavior for fatal errors didn't change by the way.
> > 
> > > Why not just grab a lock for the entire duration of error reporting?
> > 
> > Because we may run into second error while reporting the first one in the same thread. Grabbing a lock would cause a deadlock in this case.
> >> Did you ever understand how to test original code :) ? 
> That's not an argument. We don't want to make things worse just because they are not good at the moment. 
> 
> >> Because we may run into second error while reporting the first one in the same thread. Grabbing a lock would cause a deadlock in this case.
> Yea. Async signal safety... 
Funny enough, even current ASan isn't asynch signal stable. If you have repeated error in nodeferred signals which are being send to a thread, it may deadlock during error reporting in ScopedInErrorLock. The sequence is roughly:
* kernel delivers signal
* trigger error
* lock report file
* call internal_write
* enter kernel
* kernel delivers next nodeferred signal
* trigger the same error
* etc.

Repository:
  rL LLVM

http://reviews.llvm.org/D12318