[llvm-dev] Inexplicable ASAN report. Code generation bug?

Greg Stark via llvm-dev llvm-dev at lists.llvm.org
Wed Nov 11 18:43:21 PST 2015


On Thu, Nov 12, 2015 at 2:18 AM, Greg Stark <stark at mit.edu> wrote:
> So it looks to me like -O2 is causing the optimizer to turn the 2-byte
> read into a 4-byte read and overrun the allocated object. But I
> haven't tried looking at the assembly yet.

Fwiw the assembly is obviously a 4-byte load but identifying the
mapping from the C code reading two of those bytes to the assembly is
a bit beyond my level of familiarity with x86 assembly:

#define NUMERIC_WEIGHT(n) (NUMERIC_HEADER_IS_SHORT((n)) ? \
    (((n)->choice.n_short.n_header & NUMERIC_SHORT_WEIGHT_SIGN_MASK ? \
      ~NUMERIC_SHORT_WEIGHT_MASK : 0) \
     | ((n)->choice.n_short.n_header & NUMERIC_SHORT_WEIGHT_MASK)) \
    : ((n)->choice.n_long.n_weight))

...
   0x000000000144a6ba <+1050>: mov    %r15d,%ecx
   0x000000000144a6bd <+1053>: and    $0x7,%ecxn
   0x000000000144a6c0 <+1056>: add    $0x3,%ecx
   0x000000000144a6c3 <+1059>: movsbl %al,%eax
   0x000000000144a6c6 <+1062>: cmp    %eax,%ecx
   0x000000000144a6c8 <+1064>: jl     0x144a441 <numeric_out+417>
   0x000000000144a6ce <+1070>: mov    %r15,%rdi
   0x000000000144a6d1 <+1073>: callq  0x526350 <__asan_report_load4>
==> 0x000000000144a6d6 <+1078>: mov    $0x2e02680,%edi
   0x000000000144a6db <+1083>: mov    %r14,0x10(%rbx)
   0x000000000144a6df <+1087>: mov    %rsi,%r14
   0x000000000144a6e2 <+1090>: callq  0x53b1b0 <__sanitizer_cov()>
   0x000000000144a6e7 <+1095>: mov    %r14,%rsi
   0x000000000144a6ea <+1098>: mov    0x10(%rbx),%r14
   0x000000000144a6ee <+1102>: jmpq   0x144a4a1 <numeric_out+513>
   0x000000000144a6f3 <+1107>: mov    %edi,%ecx
   0x000000000144a6f5 <+1109>: and    $0x7,%ecx
   0x000000000144a6f8 <+1112>: add    $0x3,%ecx
   0x000000000144a6fb <+1115>: movsbl %al,%eax
...

So I'm guessing the logic is that the struct as that VLA so the
compiler sees that the struct size is at least two more bytes so it
assumes memory will always be allocated for the whole object and feels
free to reference those extra bytes? In practice this is a false
positive since the object will always be aligned so those two extra
bytes will always be on the same page. We already have another code
site with a similar false positive but it's a much narrower
circumstance. If this can happen anywhere there's a non 4-byte access
then that will remove a lot of the value in asan (fwiw msan doesn't
complain about this)

-- 
greg


More information about the llvm-dev mailing list