<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    On 12/16/11 7:56 PM, Chris Lattner wrote:

    <blockquote

      cite="mid:FC8F9439-9A9B-43DB-AD11-67D15A8DD164@apple.com"

      type="cite">

      <meta http-equiv="Content-Type" content="text/html;

        charset=ISO-8859-1">

      <div><span class="Apple-style-span"

          style="-webkit-tap-highlight-color: rgba(26, 26, 26,

          0.296875); -webkit-composition-fill-color: rgba(175, 192, 227,

          0.230469); -webkit-composition-frame-color: rgba(77, 128, 180,

          0.230469); ">On Dec 16, 2011, at 3:29 PM, John Criswell <<a

            moz-do-not-send="true" href="mailto:criswell@illinois.edu">criswell@illinois.edu</a>>

          wrote:</span></div>

      <blockquote type="cite">

        <div> On 12/16/11 4:45 PM, Chris Lattner wrote:

          <blockquote

            cite="mid:ECA41BF4-D1D5-446A-BD6D-87ED2CEC5D73@apple.com"

            type="cite"> <br>

            <div>

              <div>On Dec 16, 2011, at 2:41 PM, John Criswell wrote:</div>

              <br class="Apple-interchange-newline">

              <blockquote type="cite">

                <div bgcolor="#FFFFFF" text="#000000"> On 12/16/11 4:14

                  PM, Chris Lattner wrote:

                  <blockquote

                    cite="mid:19D8F13C-C5A8-44D3-861A-2CFF72BC2D94@apple.com"

                    type="cite">

                    <div>

                      <div>On Dec 16, 2011, at 12:39 PM, Kostya

                        Serebryany wrote:</div>

                      <blockquote type="cite">

                        <div class="gmail_quote">

                          <blockquote class="gmail_quote"

                            style="margin-top: 0px; margin-right: 0px;

                            margin-bottom: 0px; margin-left: 0.8ex;

                            border-left-width: 1px; border-left-color:

                            rgb(204, 204, 204); border-left-style:

                            solid; padding-left: 1ex; position: static;

                            z-index: auto; ">

                            <div class="HOEnZb">

                              <div class="h5">> Do we consider the

                                above transformation legal?<br>

                              </div>

                            </div>

                          </blockquote>

                        </div>

                      </blockquote>

                      <div><br>

                      </div>

                      <div>Yes, the transformation is perfectly legal

                        for the normal compiler.</div>

                    </div>

                  </blockquote>

                  <br>

                  So how do you guarantee that the behavior is

                  predictable regardless of hardware platform if you

                  don't define what the behavior should be?<br>

                </div>

              </blockquote>

              <div><br>

              </div>

              <div>I'm not sure what you mean.  What isn't defined?</div>

            </div>

          </blockquote>

          <br>

          The alloca in question allocates 22 bytes.  The 64-bit load in

          Kostya's original email is accessing two additional bytes past

          the end of the alloca (i.e., it is accessing array "elements"

          a[22] and a[23]).  Accessing that memory with a read or write

          is undefined behavior.  The program could fault, read zeros,

          read arbitrary bit patterns, etc.<br>

        </div>

      </blockquote>

      <div><br>

      </div>

      <div>John, I think that you are missing that these operations are

        fully defined by LLVM IR.  I'm not sure what languages rules you

        are drawing these rules from, but they are not the rules of IR.</div>

    </blockquote>

    <br>

    I apologize for mixing C and LLVM notation.<br>

    <br>

    If you want to distinguish between C and LLVM semantics, then I

    think load-widening in this particular case has two problems:<br>

    <br>

    1) The load-widening transform is not guaranteed to preserve the

    semantics of the original C program unless the OS and hardware

    fulfill certain assumptions.  The original C program performs

    in-bounds memory accesses at a[16] and a[21].  The transformed

    equivalent does the same thing *except* that it also reads two bytes

    outside the bounds of the array a.  The transformed program is only

    equivalent *if* the OS and hardware fulfill certain assumptions

    (which, fortunately, they do on common systems currently in use).<br>

    <br>

    2) The load-widening transform introduces behavior that, as far as I

    know, is undefined at the LLVM IR level.  It takes an LLVM program

    that loads two values from the alloca'ed memory and changes it to be

    a single load that accesses the same memory locations plus two

    out-of-bounds locations.  Again, if the OS and hardware fulfill

    certain assumptions, then the fault behavior and computation

    behavior of the LLVM IR before and after the optimization is the

    same, but if those assumptions don't hold, then the transformed

    program may act differently than the original.<br>

    <br>

    At the LLVM IR level, performing a load or store in which part of

    the accessed memory is within bounds and part of it is out of bounds

    is undefined, correct?  At the very least, I would think that there

    would not be a defined behavior for such a load or store.<br>

    <br>

    Am I making sense now?  Is there something I'm misunderstanding

    here?<br>

    <br>

    <blockquote

      cite="mid:FC8F9439-9A9B-43DB-AD11-67D15A8DD164@apple.com"

      type="cite">

      <div><br>

      </div>

      <div>Doing this inside a compiler (the way we do) also is not

        invalid according to the C notions of undefined behavior, as it

        has the "as if" rule.  I agree that doing this at the source

        level would be invalid.</div>

      <div><br>

      </div>

      <div>Again, I'm not opposed to having a way to disable these

        transformations, we just need a clean way to express it.</div>

    </blockquote>

    <br>

    Having a list of which optimizations are safe to run and which ones

    are not can become tedious.  I'd rather fix the optimization so that

    it always preserves the semantics of the program unless there's a

    very compelling reason not to do so (e.g., significant performance

    loss).<br>

    <br>

    In this case, it seems easy: instead of requiring that "align 16" is

    sufficient for doing load widening, require that the memory must be

    marked "align 16" and the allocation size is a multiple of 8.  That

    will force load-widening to only occur when it is safe.  There's

    probably other ways to fix it as well (e.g., have load-widening

    widen to the largest size that meets the above two conditions).<br>

    <br>

    -- John T.<br>

    <br>

    <blockquote

      cite="mid:FC8F9439-9A9B-43DB-AD11-67D15A8DD164@apple.com"

      type="cite">

      <div><br>

      </div>

      <div>-Chris</div>

      <br>

      <blockquote type="cite">

        <div> In other words, the compiler is transforming this:<br>

          <br>

          return a[16] + a[21];<br>

          <br>

          into something like this:<br>

          <br>

          unsigned long * p = &(a[16]);<br>

          unsigned long v = *p; // This accesses memory locations a[22]

          and a[23]; doing so is undefined behavior<br>

          (do some bit fiddling to extra a[16] and a[17] from v)<br>

          <br>

          The original code is memory safe and exhibits defined

          behavior.  You can do whatever crazy, semantics-preserving

          optimization you want, run it on any crazy architecture you

          want, and it'll always exhibit the same behavior.<br>

          <br>

          The optimized code exhibits undefined behavior. On most

          systems, it just reads garbage data that is ignored by the

          compiler, but that's really just a side-effect of how most

          OS's and architectures do things.  If you do some crazy

          transforms or run on some obscure architecture, the optimized

          code may break.<br>

          <br>

          <blockquote

            cite="mid:ECA41BF4-D1D5-446A-BD6D-87ED2CEC5D73@apple.com"

            type="cite">

            <div><br>

              [snip]<br>

              <blockquote type="cite">

                <div bgcolor="#FFFFFF" text="#000000">What if you have a

                  funky architecture that someone is porting LLVM to, or

                  someone is using x86-32 segments in an interesting

                  way?<br>

                </div>

              </blockquote>

              <div><br>

              </div>

              <div>We'll burn that bridge when we get to it ;-)</div>

            </div>

          </blockquote>

          <br>

          ASAN got burnt; SAFECode probably got burnt, too.  If we work

          around it, some poor researcher or developer may get burnt by

          it, too, and spend some time figuring out why his

          correct-looking program is not acting properly.  In other

          words, you're burning someone else's bridge.<br>

          <br>

          Granted, perhaps the benefits of an incorrect optimization may

          outweigh the loss of using LLVM on novel systems, but are you

          sure that making the optimization work properly is going to be

          so detrimental?<br>

          <br>

          <blockquote

            cite="mid:ECA41BF4-D1D5-446A-BD6D-87ED2CEC5D73@apple.com"

            type="cite">

            <div><br>

              <blockquote type="cite">

                <div bgcolor="#FFFFFF" text="#000000">Moreover, I don't

                  really understand the rationale for allowing a

                  transform to introduce undefined behavior into

                  programs that exhibit no undefined behavior.</div>

              </blockquote>

              <div><br>

              </div>

              <div>There is no undefined behavior here.  This is exactly

                analogous to the code you get for bitfield accesses.  If

                you have an uninitialized struct and start storing into

                its fields (to initialize it)  you get a series of "load

                + mask + or + store" operations.  These are loading and

                touching "undefined" bits in a completely defined way.</div>

            </div>

          </blockquote>

          <br>

          I'll agree that they're both undefined behavior, but I don't

          think they fall within the same category.  The bit-mask

          initializing issue is a compromise you made because there

          either isn't an alternative way to do it that has defined

          behavior, or such an alternative is too expensive and

          difficult to implement and/or use.<br>

          <br>

          This appears to be a different case.  Fixing the optimization

          looks simple enough to me (am I missing something?), and I'm

          not convinced that fixing it would hurt performance (although

          since I haven't run an experiment, that is conjecture).<br>

          <br>

          So, perhaps I should ask this: if someone took the time to fix

          the transform so that it checks both the alignment *and* the

          allocation size and measures the resulting performance change,

          how much would performance need to suffer before the cure was

          deemed worse than the disease?<br>

          <br>

          -- John T.<br>

          <br>

          <br>

          <br>

            <br>

          <blockquote

            cite="mid:ECA41BF4-D1D5-446A-BD6D-87ED2CEC5D73@apple.com"

            type="cite">

            <div>

              <div><br>

              </div>

              <div>-Chris</div>

            </div>

            <br>

          </blockquote>

          <br>

        </div>

      </blockquote>

    </blockquote>

    <br>

  </body>

</html>