[LLVMdev] load widening conflicts with AddressSanitizer

Tue Jan 24 14:19:26 PST 2012

On Tue, Jan 24, 2012 at 2:00 PM, John Criswell <criswell at illinois.edu>wrote:

>  On 1/24/12 3:36 PM, Kostya Serebryany wrote:
>
>
>
> On Tue, Jan 24, 2012 at 1:08 PM, John Criswell <criswell at illinois.edu>wrote:
>
>> On 1/24/12 2:31 PM, Duncan Sands wrote:
>>
>>> Hi Kostya,
>>>
>>>      As far as I can see the C and C++ standards are not relevant.  ASAN
>>>> works on
>>>>     LLVM IR, not on C or C++.  Lots of different languages have LLVM
>>>> frontends.  I
>>>>     personally turn Ada and Fortran into LLVM IR all the time for
>>>> example.  Clearly
>>>>     the C standard is not relevant to LLVM IR coming from such
>>>> languages.  What
>>>>     matters is how LLVM IR is defined.  As far as I know this construct
>>>> is perfectly
>>>>     valid in LLVM IR.
>>>>
>>>
>>
>>  The issue here is that a load that reads data past the end of an alloca
>> can occur at the LLVM IR level in one of three ways:
>>
>> 1) Because the program at the original source-code level does it and is
>> incorrect.
>> 2) Because the program at the original source-code level does it and is
>> correct (although that must be a pretty wacky language).
>> 3) Load-widening introduces it when processing loads from allocas that
>> are properly aligned.
>>
>> As it is today, an analysis cannot look at the LLVM IR and know which
>> condition is causing the load to read data past the end of the memory
>> object.  As such, tools like SAFECode and ASAN don't know when to relax
>> their run-time checks to permit such out-of-bounds reading; they either
>> have to relax it for all such loads (in which case a bug in the C source
>> code might slip through), or they have to report it all the time (and
>> report false positives for correct C programs).
>>
>> I assume Kostya's new attribute is a way to permit the LLVM IR to specify
>> whether such an out-of-bounds read is intentional or not.
>>
>> In my opinion, I don't think we should bother with an attribute.
>>  Load-widening's behavior does not introduce exploitable code into the
>> program on commonly-used machines and operating systems(*), and incorrect
>> source code at the C source level that exhibits identical behavior isn't
>> exploitable, either.
>
>
>  SAFECode can be enhanced so that the run-time checks for loads relax
>> their guarantees for aligned allocas that are subject to load-widening; I
>> imagine ASAN can be similarly modified.
>>
>
> ASAN *can* be modified this way (it will actually make instrumentation
> ~10% cheaper).
> But this mode will miss some bugs that the current mode finds.
> I've seen at least a couple of such *real* bugs.
>
>
> Yes, I understand.  My question is how many such bugs have you seen that
> involve loads *and* allocas aligned in such a way that the load-widening
> optimization triggers.
>

So far I've seen two cases where the patch in MemoryDependenceAnalysis.cpp
(above) will help.
First, bug reported by Mozilla folks:
http://code.google.com/p/address-sanitizer/issues/detail?id=20#c1
Second, false warning while doing asan/clang bootstrap. There is a
struct LVFlags in clang which contains 3 bools. They've got lowered to a
32-bit load.
[Or you asked something different?]
We build a lot of our code with asan/O1 and are not moving to asan/O2
because of this problem.

BTW, I have a clang/asan 3-stage bootstrap working locally.
Once this bug is fixed, we plan to set a continuous clang/asan bootstrap
bot.

>
>
>
>  And these bugs are not only about exploitability, but also about
> correctness.
> If a program reads garbage, there is no simple way to statically prove
> that this garbage does not affect the program's behavior.
>
>
> Hrm.  Actually, by relaxing the safety guarantees, SAFECode and ASAN may
> fail to detect exploitable behavior in the original program, so I take back
> my original comment.  That said, it's a pretty obscure attack, so it's
> pretty low on my list of things to worry about.
>
> For me, the right way to go (barring a change in opinion from Chris) is to
> either disable the load-widening transform,
>

This is what the patch does essentially. It disables a subset
of load-widening when asan is on.
We can disable all cases of load-widening, but that may cost a bit of
performance (under asan).

> transform the allocas to be larger, or to relax the safety guarantees.
> The problem with attributes is that they are brittle; you have to make sure
> they get added to the right instructions, then you have to make sure they
> don't get removed by optimizations.
>
This is a function attribute, much more stable.

--kcc

>
> For SAFECode, I'm alright with transforms that "force" a program to have
> memory safe behavior even if they do not report a bug (such as boosting the
> allocation size of allocas subject to load-widening).  ASAN may not be
> willing to do that (and understandably so).  I'm not sure what to suggest.
>
> -- John T.
>
>
>
>  --kcc
>
>
>
>>
>> We won't catch some bugs in C/C++ code, but that's a natural consequence
>> of deciding to permit certain out-of-bounds loads at the LLVM IR level,
>> IMHO.
>>
>> My two cents.
>>
>> -- John T.
>>
>> (*) All bets are off for unconventional systems, though.
>>
>>
>>
>>>>
>>>> Asan will not work for Fortran and Ada anyway (at least, out of the
>>>> box).
>>>> I am not even sure that anything like asan is needed for Ada (it has
>>>> bounds
>>>> checking built-in, the dynamic memory allocation is much more
>>>> restrictive).
>>>> The tool is rather specific to C/C++ (and ObjectiveC probably, although
>>>> we have
>>>> almost no tests for ObjectiveC, nor much knowledge in it).
>>>> Yes, the IR transformations are done on the LLVM level, but the asan
>>>> run-time
>>>> library heavily depends on the C/C++ semantics and even implementation,
>>>> and you can't really separate the asan instrumentation pass from the
>>>> run-time.
>>>>
>>> it's pretty disappointing to hear that asan is basically just for C.
>>>  But since
>>> it is, I won't bother you anymore about this attribute (though I still
>>> don't
>>> like it much).
>>>
>>> Ciao, Duncan.
>>>  _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120124/4a651d6c/attachment.html>