[LLVMdev] load widening conflicts with AddressSanitizer
John Criswell
criswell at illinois.edu
Mon Dec 19 20:59:44 PST 2011
On 12/19/11 6:31 PM, Chris Lattner wrote:
> On Dec 19, 2011, at 3:04 PM, John Criswell wrote:
>>>> The alloca in question allocates 22 bytes. The 64-bit load in Kostya's original email is accessing two additional bytes past the end of the alloca (i.e., it is accessing array "elements" a[22] and a[23]). Accessing that memory with a read or write is undefined behavior. The program could fault, read zeros, read arbitrary bit patterns, etc.
>>> John, I think that you are missing that these operations are fully defined by LLVM IR. I'm not sure what languages rules you are drawing these rules from, but they are not the rules of IR.
>> I apologize for mixing C and LLVM notation.
>>
>> If you want to distinguish between C and LLVM semantics, then I think load-widening in this particular case has two problems:
>>
>> 1) The load-widening transform is not guaranteed to preserve the semantics of the original C program unless the OS and hardware fulfill certain assumptions.
> Sure, but these assumptions are true of all current hardware and OS's supported by LLVM.
I realize that my arguments appear pedantic. However, I think there are
some good reasons for my pedantic-ness:
1) Novel systems can break the assumptions that make this transform
work. I believe avoiding these assumptions so that LLVM is suitable for
use with future systems that may want to break these assumptions is a
good thing so long as it does not cause a *significant* negative impact
when LLVM used on current systems.
2) These assumptions should (in an ideal world) be stated up front if we
keep them.
3) I'd like memory safety tools and static analyzers to work with
load-widening, and more generally, I'd like them to work with any LLVM
IR optimization that claims to preserve LLVM semantics.
4) It looks like you're depending on both the LLVM implementation and
underlying details of the OS/architecture to make the LLVM semantics
work the way you want them to. I don't think that's a good idea
because, again, future systems may break that assumption.
>
>> 2) The load-widening transform introduces behavior that, as far as I know, is undefined at the LLVM IR level.
> This is incorrect, it is fully defined for LLVM IR.
I haven't checked the LangRef in awhile, but I'll trust Dan when he says
that it's not defined there. :)
I would also question whether you really want to add this as a
requirement for the LLVM IR. It seems like a dangerous option for
future-proofing the IR against hardware changes.
>
>> Am I making sense now? Is there something I'm misunderstanding here?
> Yes, it is that this is fully defined by LLVM IR. :) This is not defined by C. This is another case where LLVM IR is more general than C is.
>
>>> Doing this inside a compiler (the way we do) also is not invalid according to the C notions of undefined behavior, as it has the "as if" rule. I agree that doing this at the source level would be invalid.
>>>
>>> Again, I'm not opposed to having a way to disable these transformations, we just need a clean way to express it.
>> Having a list of which optimizations are safe to run and which ones are not can become tedious. I'd rather fix the optimization so that it always preserves the semantics of the program unless there's a very compelling reason not to do so (e.g., significant performance loss).
> This is one instance of a class of related optimizations, and you're assuming that they were added for no good reason. The only reason that someone bothered to add them to the compiler is that they added a worthwhile performance win.
No, Chris. I am fully aware that optimizations are added for speed.
The question is, "How much speed would you lose on real programs if the
optimization were changed to make SAFECode/ASAN work, and would the loss
of that speed be acceptable?"
I asked you this question earlier to see if it would be worth my time to
implement the change, measure the performance on programs, and, if the
results were good, submit a patch. If you don't want to lose one ounce
of performance, then I certainly won't spend my time running such an
experiment. On the other hand, if a certain performance sacrifice is
acceptable to make load-widening work with SAFECode/ASAN (and to free
LLVM from the requirement of having to define this load/store behavior),
then I might be willing to do it.
I'll assume for now that you want every ounce of performance. Let me
know if I assume wrong.
>
> If you're willing to move this discussion from "whether this is correct to do" to "what should we change in the compiler to support asan and safecode" then I think we'll have a more productive discussion. This doesn't seem very hard to solve to everyone's satisfaction.
Actually, I think this discussion has been productive. You have now
explicitly stated that a load or store that "falls off" the end of a
memory object in LLVM IR is defined behavior (at least when done when
the memory object is properly aligned), and Dan Gohman found a bug in
the LangRef because no one documented this behavior. This was certainly
surprising to me.
As far as "fixing" the issue for SAFECode, I actually care not only
about getting the problem fixed but also whether the fix is the right
fix. I think letting load-widening do this adds an unnecessary
requirement about how the underlying system upon which LLVM runs must
act, and it creates a situation in which SAFECode cannot permit
arbitrary LLVM optimizations for C code to be run.
That latter issue may be a real thorn if SAFECode ever becomes a Clang
plugin because SAFECode will need to go into the PassManger and tell it
to disable "bad" optimizations.
-- John T.
>
> -Chris
>
More information about the llvm-dev
mailing list