[LLVMdev] load widening conflicts with AddressSanitizer

Tue Jan 24 14:00:16 PST 2012

On 1/24/12 3:36 PM, Kostya Serebryany wrote:
>
>
> On Tue, Jan 24, 2012 at 1:08 PM, John Criswell <criswell at illinois.edu 
> <mailto:criswell at illinois.edu>> wrote:
>
>     On 1/24/12 2:31 PM, Duncan Sands wrote:
>
>         Hi Kostya,
>
>                 As far as I can see the C and C++ standards are not
>             relevant.  ASAN works on
>                 LLVM IR, not on C or C++.  Lots of different languages
>             have LLVM frontends.  I
>                 personally turn Ada and Fortran into LLVM IR all the
>             time for example.  Clearly
>                 the C standard is not relevant to LLVM IR coming from
>             such languages.  What
>                 matters is how LLVM IR is defined.  As far as I know
>             this construct is perfectly
>                 valid in LLVM IR.
>
>
>
>     The issue here is that a load that reads data past the end of an
>     alloca can occur at the LLVM IR level in one of three ways:
>
>     1) Because the program at the original source-code level does it
>     and is incorrect.
>     2) Because the program at the original source-code level does it
>     and is correct (although that must be a pretty wacky language).
>     3) Load-widening introduces it when processing loads from allocas
>     that are properly aligned.
>
>     As it is today, an analysis cannot look at the LLVM IR and know
>     which condition is causing the load to read data past the end of
>     the memory object.  As such, tools like SAFECode and ASAN don't
>     know when to relax their run-time checks to permit such
>     out-of-bounds reading; they either have to relax it for all such
>     loads (in which case a bug in the C source code might slip
>     through), or they have to report it all the time (and report false
>     positives for correct C programs).
>
>     I assume Kostya's new attribute is a way to permit the LLVM IR to
>     specify whether such an out-of-bounds read is intentional or not.
>
>     In my opinion, I don't think we should bother with an attribute.
>      Load-widening's behavior does not introduce exploitable code into
>     the program on commonly-used machines and operating systems(*),
>     and incorrect source code at the C source level that exhibits
>     identical behavior isn't exploitable, either. 
>
>
>     SAFECode can be enhanced so that the run-time checks for loads
>     relax their guarantees for aligned allocas that are subject to
>     load-widening; I imagine ASAN can be similarly modified.
>
> ASAN *can* be modified this way (it will actually make instrumentation 
> ~10% cheaper).
> But this mode will miss some bugs that the current mode finds.
> I've seen at least a couple of such *real* bugs.

Yes, I understand.  My question is how many such bugs have you seen that 
involve loads *and* allocas aligned in such a way that the load-widening 
optimization triggers.

>
> And these bugs are not only about exploitability, but also about 
> correctness.
> If a program reads garbage, there is no simple way to statically prove 
> that this garbage does not affect the program's behavior.

Hrm.  Actually, by relaxing the safety guarantees, SAFECode and ASAN may 
fail to detect exploitable behavior in the original program, so I take 
back my original comment.  That said, it's a pretty obscure attack, so 
it's pretty low on my list of things to worry about.

For me, the right way to go (barring a change in opinion from Chris) is 
to either disable the load-widening transform, transform the allocas to 
be larger, or to relax the safety guarantees.  The problem with 
attributes is that they are brittle; you have to make sure they get 
added to the right instructions, then you have to make sure they don't 
get removed by optimizations.

For SAFECode, I'm alright with transforms that "force" a program to have 
memory safe behavior even if they do not report a bug (such as boosting 
the allocation size of allocas subject to load-widening).  ASAN may not 
be willing to do that (and understandably so).  I'm not sure what to 
suggest.

-- John T.

>
> --kcc
>
>
>     We won't catch some bugs in C/C++ code, but that's a natural
>     consequence of deciding to permit certain out-of-bounds loads at
>     the LLVM IR level, IMHO.
>
>     My two cents.
>
>     -- John T.
>
>     (*) All bets are off for unconventional systems, though.
>
>
>
>
>             Asan will not work for Fortran and Ada anyway (at least,
>             out of the box).
>             I am not even sure that anything like asan is needed for
>             Ada (it has bounds
>             checking built-in, the dynamic memory allocation is much
>             more restrictive).
>             The tool is rather specific to C/C++ (and ObjectiveC
>             probably, although we have
>             almost no tests for ObjectiveC, nor much knowledge in it).
>             Yes, the IR transformations are done on the LLVM level,
>             but the asan run-time
>             library heavily depends on the C/C++ semantics and even
>             implementation,
>             and you can't really separate the asan instrumentation
>             pass from the run-time.
>
>         it's pretty disappointing to hear that asan is basically just
>         for C.  But since
>         it is, I won't bother you anymore about this attribute (though
>         I still don't
>         like it much).
>
>         Ciao, Duncan.
>         _______________________________________________
>         LLVM Developers mailing list
>         LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>         http://llvm.cs.uiuc.edu
>         http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120124/4176158a/attachment.html>