[LLVMdev] load widening conflicts with AddressSanitizer

Mon Dec 19 16:31:22 PST 2011

On Dec 19, 2011, at 3:04 PM, John Criswell wrote:
>>> The alloca in question allocates 22 bytes.  The 64-bit load in Kostya's original email is accessing two additional bytes past the end of the alloca (i.e., it is accessing array "elements" a[22] and a[23]).  Accessing that memory with a read or write is undefined behavior.  The program could fault, read zeros, read arbitrary bit patterns, etc.
>> 
>> John, I think that you are missing that these operations are fully defined by LLVM IR.  I'm not sure what languages rules you are drawing these rules from, but they are not the rules of IR.
> 
> I apologize for mixing C and LLVM notation.
> 
> If you want to distinguish between C and LLVM semantics, then I think load-widening in this particular case has two problems:
> 
> 1) The load-widening transform is not guaranteed to preserve the semantics of the original C program unless the OS and hardware fulfill certain assumptions.

Sure, but these assumptions are true of all current hardware and OS's supported by LLVM.

> 2) The load-widening transform introduces behavior that, as far as I know, is undefined at the LLVM IR level.

This is incorrect, it is fully defined for LLVM IR.

> Am I making sense now?  Is there something I'm misunderstanding here?

Yes, it is that this is fully defined by LLVM IR. :)  This is not defined by C.  This is another case where LLVM IR is more general than C is.

>> Doing this inside a compiler (the way we do) also is not invalid according to the C notions of undefined behavior, as it has the "as if" rule.  I agree that doing this at the source level would be invalid.
>> 
>> Again, I'm not opposed to having a way to disable these transformations, we just need a clean way to express it.
> 
> Having a list of which optimizations are safe to run and which ones are not can become tedious.  I'd rather fix the optimization so that it always preserves the semantics of the program unless there's a very compelling reason not to do so (e.g., significant performance loss).

This is one instance of a class of related optimizations, and you're assuming that they were added for no good reason.  The only reason that someone bothered to add them to the compiler is that they added a worthwhile performance win.

If you're willing to move this discussion from "whether this is correct to do" to "what should we change in the compiler to support asan and safecode" then I think we'll have a more productive discussion.  This doesn't seem very hard to solve to everyone's satisfaction.

-Chris