<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
On 12/16/11 7:56 PM, Chris Lattner wrote:
<blockquote
cite="mid:FC8F9439-9A9B-43DB-AD11-67D15A8DD164@apple.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=ISO-8859-1">
<div><span class="Apple-style-span"
style="-webkit-tap-highlight-color: rgba(26, 26, 26,
0.296875); -webkit-composition-fill-color: rgba(175, 192, 227,
0.230469); -webkit-composition-frame-color: rgba(77, 128, 180,
0.230469); ">On Dec 16, 2011, at 3:29 PM, John Criswell <<a
moz-do-not-send="true" href="mailto:criswell@illinois.edu">criswell@illinois.edu</a>>
wrote:</span></div>
<blockquote type="cite">
<div> On 12/16/11 4:45 PM, Chris Lattner wrote:
<blockquote
cite="mid:ECA41BF4-D1D5-446A-BD6D-87ED2CEC5D73@apple.com"
type="cite"> <br>
<div>
<div>On Dec 16, 2011, at 2:41 PM, John Criswell wrote:</div>
<br class="Apple-interchange-newline">
<blockquote type="cite">
<div bgcolor="#FFFFFF" text="#000000"> On 12/16/11 4:14
PM, Chris Lattner wrote:
<blockquote
cite="mid:19D8F13C-C5A8-44D3-861A-2CFF72BC2D94@apple.com"
type="cite">
<div>
<div>On Dec 16, 2011, at 12:39 PM, Kostya
Serebryany wrote:</div>
<blockquote type="cite">
<div class="gmail_quote">
<blockquote class="gmail_quote"
style="margin-top: 0px; margin-right: 0px;
margin-bottom: 0px; margin-left: 0.8ex;
border-left-width: 1px; border-left-color:
rgb(204, 204, 204); border-left-style:
solid; padding-left: 1ex; position: static;
z-index: auto; ">
<div class="HOEnZb">
<div class="h5">> Do we consider the
above transformation legal?<br>
</div>
</div>
</blockquote>
</div>
</blockquote>
<div><br>
</div>
<div>Yes, the transformation is perfectly legal
for the normal compiler.</div>
</div>
</blockquote>
<br>
So how do you guarantee that the behavior is
predictable regardless of hardware platform if you
don't define what the behavior should be?<br>
</div>
</blockquote>
<div><br>
</div>
<div>I'm not sure what you mean. What isn't defined?</div>
</div>
</blockquote>
<br>
The alloca in question allocates 22 bytes. The 64-bit load in
Kostya's original email is accessing two additional bytes past
the end of the alloca (i.e., it is accessing array "elements"
a[22] and a[23]). Accessing that memory with a read or write
is undefined behavior. The program could fault, read zeros,
read arbitrary bit patterns, etc.<br>
</div>
</blockquote>
<div><br>
</div>
<div>John, I think that you are missing that these operations are
fully defined by LLVM IR. I'm not sure what languages rules you
are drawing these rules from, but they are not the rules of IR.</div>
</blockquote>
<br>
I apologize for mixing C and LLVM notation.<br>
<br>
If you want to distinguish between C and LLVM semantics, then I
think load-widening in this particular case has two problems:<br>
<br>
1) The load-widening transform is not guaranteed to preserve the
semantics of the original C program unless the OS and hardware
fulfill certain assumptions. The original C program performs
in-bounds memory accesses at a[16] and a[21]. The transformed
equivalent does the same thing *except* that it also reads two bytes
outside the bounds of the array a. The transformed program is only
equivalent *if* the OS and hardware fulfill certain assumptions
(which, fortunately, they do on common systems currently in use).<br>
<br>
2) The load-widening transform introduces behavior that, as far as I
know, is undefined at the LLVM IR level. It takes an LLVM program
that loads two values from the alloca'ed memory and changes it to be
a single load that accesses the same memory locations plus two
out-of-bounds locations. Again, if the OS and hardware fulfill
certain assumptions, then the fault behavior and computation
behavior of the LLVM IR before and after the optimization is the
same, but if those assumptions don't hold, then the transformed
program may act differently than the original.<br>
<br>
At the LLVM IR level, performing a load or store in which part of
the accessed memory is within bounds and part of it is out of bounds
is undefined, correct? At the very least, I would think that there
would not be a defined behavior for such a load or store.<br>
<br>
Am I making sense now? Is there something I'm misunderstanding
here?<br>
<br>
<blockquote
cite="mid:FC8F9439-9A9B-43DB-AD11-67D15A8DD164@apple.com"
type="cite">
<div><br>
</div>
<div>Doing this inside a compiler (the way we do) also is not
invalid according to the C notions of undefined behavior, as it
has the "as if" rule. I agree that doing this at the source
level would be invalid.</div>
<div><br>
</div>
<div>Again, I'm not opposed to having a way to disable these
transformations, we just need a clean way to express it.</div>
</blockquote>
<br>
Having a list of which optimizations are safe to run and which ones
are not can become tedious. I'd rather fix the optimization so that
it always preserves the semantics of the program unless there's a
very compelling reason not to do so (e.g., significant performance
loss).<br>
<br>
In this case, it seems easy: instead of requiring that "align 16" is
sufficient for doing load widening, require that the memory must be
marked "align 16" and the allocation size is a multiple of 8. That
will force load-widening to only occur when it is safe. There's
probably other ways to fix it as well (e.g., have load-widening
widen to the largest size that meets the above two conditions).<br>
<br>
-- John T.<br>
<br>
<blockquote
cite="mid:FC8F9439-9A9B-43DB-AD11-67D15A8DD164@apple.com"
type="cite">
<div><br>
</div>
<div>-Chris</div>
<br>
<blockquote type="cite">
<div> In other words, the compiler is transforming this:<br>
<br>
return a[16] + a[21];<br>
<br>
into something like this:<br>
<br>
unsigned long * p = &(a[16]);<br>
unsigned long v = *p; // This accesses memory locations a[22]
and a[23]; doing so is undefined behavior<br>
(do some bit fiddling to extra a[16] and a[17] from v)<br>
<br>
The original code is memory safe and exhibits defined
behavior. You can do whatever crazy, semantics-preserving
optimization you want, run it on any crazy architecture you
want, and it'll always exhibit the same behavior.<br>
<br>
The optimized code exhibits undefined behavior. On most
systems, it just reads garbage data that is ignored by the
compiler, but that's really just a side-effect of how most
OS's and architectures do things. If you do some crazy
transforms or run on some obscure architecture, the optimized
code may break.<br>
<br>
<blockquote
cite="mid:ECA41BF4-D1D5-446A-BD6D-87ED2CEC5D73@apple.com"
type="cite">
<div><br>
[snip]<br>
<blockquote type="cite">
<div bgcolor="#FFFFFF" text="#000000">What if you have a
funky architecture that someone is porting LLVM to, or
someone is using x86-32 segments in an interesting
way?<br>
</div>
</blockquote>
<div><br>
</div>
<div>We'll burn that bridge when we get to it ;-)</div>
</div>
</blockquote>
<br>
ASAN got burnt; SAFECode probably got burnt, too. If we work
around it, some poor researcher or developer may get burnt by
it, too, and spend some time figuring out why his
correct-looking program is not acting properly. In other
words, you're burning someone else's bridge.<br>
<br>
Granted, perhaps the benefits of an incorrect optimization may
outweigh the loss of using LLVM on novel systems, but are you
sure that making the optimization work properly is going to be
so detrimental?<br>
<br>
<blockquote
cite="mid:ECA41BF4-D1D5-446A-BD6D-87ED2CEC5D73@apple.com"
type="cite">
<div><br>
<blockquote type="cite">
<div bgcolor="#FFFFFF" text="#000000">Moreover, I don't
really understand the rationale for allowing a
transform to introduce undefined behavior into
programs that exhibit no undefined behavior.</div>
</blockquote>
<div><br>
</div>
<div>There is no undefined behavior here. This is exactly
analogous to the code you get for bitfield accesses. If
you have an uninitialized struct and start storing into
its fields (to initialize it) you get a series of "load
+ mask + or + store" operations. These are loading and
touching "undefined" bits in a completely defined way.</div>
</div>
</blockquote>
<br>
I'll agree that they're both undefined behavior, but I don't
think they fall within the same category. The bit-mask
initializing issue is a compromise you made because there
either isn't an alternative way to do it that has defined
behavior, or such an alternative is too expensive and
difficult to implement and/or use.<br>
<br>
This appears to be a different case. Fixing the optimization
looks simple enough to me (am I missing something?), and I'm
not convinced that fixing it would hurt performance (although
since I haven't run an experiment, that is conjecture).<br>
<br>
So, perhaps I should ask this: if someone took the time to fix
the transform so that it checks both the alignment *and* the
allocation size and measures the resulting performance change,
how much would performance need to suffer before the cure was
deemed worse than the disease?<br>
<br>
-- John T.<br>
<br>
<br>
<br>
<br>
<blockquote
cite="mid:ECA41BF4-D1D5-446A-BD6D-87ED2CEC5D73@apple.com"
type="cite">
<div>
<div><br>
</div>
<div>-Chris</div>
</div>
<br>
</blockquote>
<br>
</div>
</blockquote>
</blockquote>
<br>
</body>
</html>