[PATCH] D63469: hwasan: Teach the runtime to identify the local variable being accessed in UAR reports.

Evgenii Stepanov via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Jun 27 15:52:31 PDT 2019


eugenis accepted this revision.
eugenis added a comment.
This revision is now accepted and ready to land.

LGTM
We will need some kind of offline symbolization solution to go with this.



================
Comment at: compiler-rt/lib/hwasan/hwasan_report.cpp:165
+        // Calculate the offset from the object address to the faulting
+        // address. Because we only store the lower 20 bits of FP, the
+        // calculation is performed mod 2^20 and may harmlessly underflow if
----------------
> the lower 20 bits of FP
Nit: they are not the lowest 20 bits. Add "significant" or something like that?


================
Comment at: compiler-rt/lib/hwasan/hwasan_report.cpp:281
+
+      if (found_local)
+        return;
----------------
pcc wrote:
> eugenis wrote:
> > I think it might be useful to iterate until the end of the buffer to find more candidates; if anything, that would avoid losing the information that's already there.
> > 
> > This is good enough for now. Some time in the future I think we should refactor this to collect all the candidates (including ex. potential buffer overflows with larger offsets, across multiple small allocation) and sort them by likelihood.
> > 
> By the time we get here, we've already visited every frame. We don't stop when we find the first match or anything like that.
> 
> Or are you saying that we should print the `Previously allocated frames` section in all cases (e.g. for manual identification of OOB accesses)? 
I think I was confused by the amount of indentation in this function. Looks good now.


================
Comment at: compiler-rt/test/hwasan/TestCases/stack-uar.c:36
   // NOSYM: Previously allocated frames:
-  // NOSYM-NEXT: sp: 0x{{.*}} #0 0x{{.*}} ({{.*}}/stack-uar.c.tmp+0x{{.*}}){{$}}
-  // NOSYM-NEXT: 16 CCC;
+  // NOSYM-NEXT: record_addr:0x{{.*}} record:0x{{.*}} ({{.*}}/stack-uar.c.tmp+0x{{.*}}){{$}}
+  // NOSYM-NEXT: record_addr:0x{{.*}} record:0x{{.*}} ({{.*}}/stack-uar.c.tmp+0x{{.*}}){{$}}
----------------
pcc wrote:
> eugenis wrote:
> > The advantage of printing decoded records, as before, is that they can be parsed with all the old symbolization scripts. Explicitly listing the last digits of FP is also sometimes useful to match against register values in the report.
> > 
> > Could we change the format to
> > 
> > [tag], fp:0x12345 #0 PC (module+offset)  
> > 
> > #0 is also for compatibility with symbolization scripts.
> Do you think it's important to be compatible with old symbolization scripts here? I think we will probably need a new symbolization script anyway that does all the same things that the runtime is doing to identify stack objects. That's also why I decided to print raw values here, so that we'd be less likely to need to change the line format in the future if the entry format changes.
sgtm


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63469/new/

https://reviews.llvm.org/D63469





More information about the llvm-commits mailing list