[LLVMdev] load widening conflicts with AddressSanitizer

Mon Dec 19 20:59:44 PST 2011

On 12/19/11 6:31 PM, Chris Lattner wrote:
> On Dec 19, 2011, at 3:04 PM, John Criswell wrote:
>>>> The alloca in question allocates 22 bytes.  The 64-bit load in Kostya's original email is accessing two additional bytes past the end of the alloca (i.e., it is accessing array "elements" a[22] and a[23]).  Accessing that memory with a read or write is undefined behavior.  The program could fault, read zeros, read arbitrary bit patterns, etc.
>>> John, I think that you are missing that these operations are fully defined by LLVM IR.  I'm not sure what languages rules you are drawing these rules from, but they are not the rules of IR.
>> I apologize for mixing C and LLVM notation.
>>
>> If you want to distinguish between C and LLVM semantics, then I think load-widening in this particular case has two problems:
>>
>> 1) The load-widening transform is not guaranteed to preserve the semantics of the original C program unless the OS and hardware fulfill certain assumptions.
> Sure, but these assumptions are true of all current hardware and OS's supported by LLVM.

I realize that my arguments appear pedantic.  However, I think there are 
some good reasons for my pedantic-ness:

1) Novel systems can break the assumptions that make this transform 
work.  I believe avoiding these assumptions so that LLVM is suitable for 
use with future systems that may want to break these assumptions is a 
good thing so long as it does not cause a *significant* negative impact 
when LLVM used on current systems.

2) These assumptions should (in an ideal world) be stated up front if we 
keep them.

3) I'd like memory safety tools and static analyzers to work with 
load-widening, and more generally, I'd like them to work with any LLVM 
IR optimization that claims to preserve LLVM semantics.

4) It looks like you're depending on both the LLVM implementation and 
underlying details of the OS/architecture to make the LLVM semantics 
work the way you want them to.  I don't think that's a good idea 
because, again, future systems may break that assumption.

>
>> 2) The load-widening transform introduces behavior that, as far as I know, is undefined at the LLVM IR level.
> This is incorrect, it is fully defined for LLVM IR.

I haven't checked the LangRef in awhile, but I'll trust Dan when he says 
that it's not defined there. :)

I would also question whether you really want to add this as a 
requirement for the LLVM IR.  It seems like a dangerous option for 
future-proofing the IR against hardware changes.

>
>> Am I making sense now?  Is there something I'm misunderstanding here?
> Yes, it is that this is fully defined by LLVM IR. :)  This is not defined by C.  This is another case where LLVM IR is more general than C is.
>
>>> Doing this inside a compiler (the way we do) also is not invalid according to the C notions of undefined behavior, as it has the "as if" rule.  I agree that doing this at the source level would be invalid.
>>>
>>> Again, I'm not opposed to having a way to disable these transformations, we just need a clean way to express it.
>> Having a list of which optimizations are safe to run and which ones are not can become tedious.  I'd rather fix the optimization so that it always preserves the semantics of the program unless there's a very compelling reason not to do so (e.g., significant performance loss).
> This is one instance of a class of related optimizations, and you're assuming that they were added for no good reason.  The only reason that someone bothered to add them to the compiler is that they added a worthwhile performance win.

No, Chris.  I am fully aware that optimizations are added for speed.  
The question is, "How much speed would you lose on real programs if the 
optimization were changed to make SAFECode/ASAN work, and would the loss 
of that speed be acceptable?"

I asked you this question earlier to see if it would be worth my time to 
implement the change, measure the performance on programs, and, if the 
results were good, submit a patch.  If you don't want to lose one ounce 
of performance, then I certainly won't spend my time running such an 
experiment.  On the other hand, if a certain performance sacrifice is 
acceptable to make load-widening work with SAFECode/ASAN (and to free 
LLVM from the requirement of having to define this load/store behavior), 
then I might be willing to do it.

I'll assume for now that you want every ounce of performance.  Let me 
know if I assume wrong.

>
> If you're willing to move this discussion from "whether this is correct to do" to "what should we change in the compiler to support asan and safecode" then I think we'll have a more productive discussion.  This doesn't seem very hard to solve to everyone's satisfaction.

Actually, I think this discussion has been productive.  You have now 
explicitly stated that a load or store that "falls off" the end of a 
memory object in LLVM IR is defined behavior (at least when done when 
the memory object is properly aligned), and Dan Gohman found a bug in 
the LangRef because no one documented this behavior.  This was certainly 
surprising to me.

As far as "fixing" the issue for SAFECode, I actually care not only 
about getting the problem fixed but also whether the fix is the right 
fix.  I think letting load-widening do this adds an unnecessary 
requirement about how the underlying system upon which LLVM runs must 
act, and it creates a situation in which SAFECode cannot permit 
arbitrary LLVM optimizations for C code to be run.

That latter issue may be a real thorn if SAFECode ever becomes a Clang 
plugin because SAFECode will need to go into the PassManger and tell it 
to disable "bad" optimizations.

-- John T.

>
> -Chris
>