<div dir="ltr">We are careful not to try this optimization where it would extend the range of loaded memory; this is purely for what I call a "load doughnut". :)<br>Reading past either specified edge would be very bad because it could cause a memory fault / exception where there was none in the original program. That's definitely not legal.<br><div><div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Mar 16, 2016 at 12:20 PM, Craig, Ben <span dir="ltr"><<a href="mailto:ben.craig@codeaurora.org" target="_blank">ben.craig@codeaurora.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
  
    
  
  <div bgcolor="#FFFFFF" text="#000000">
    I'm having a hard time finding any problems here, at least as long
    as the value is in the middle.  I wouldn't expect the contents of
    x[2] to affect the timing or power usage of anything.  I guess there
    would be  a minor "bad" side effect in that a memory read watchpoint
    would trigger with the 128 bit load that wouldn't be there with the
    32-bit loads.  I think it is semantically very similar to this
    situation as well...<br>
    <blockquote>v4i32 first_call(int *x) { //use all of the array<br>
         int f0 = x[0];<br>
         int f1 = x[1];<br>
         int f2 = x[2];<br>
         int f3 = x[3];<br>
         return (v4i32) { f0, f1, f2, f3 };<br>
      }<br>
      v4i32 second_call(int *x) { //use some of the array<br>
         int s0 = x[0];<br>
         int s1 = x[1];<br>
         int s2 = 0;<br>
         int s3 = x[3];<br>
         return (v4i32) { s0, s1, s2, s3 };<br>
      }<br>
      first_call(x);<br>
      second_call(x);<br>
    </blockquote>
    The implementation isn't going to zero out the stack in between
    those calls, so for a short period of time, the memory location of
    s2 will contain x[2].<br>
    <br>
    I'm less sure if the gaps are on the edges.  I'm worried that you
    might ending up crossing some important address boundary if you look
    at something earlier or later than what the user requested.<div><div class="h5"><br>
    <br>
    <div>On 3/16/2016 11:38 AM, Sanjay Patel
      wrote:<br>
    </div>
    <blockquote type="cite">
      <div dir="ltr">
        <div>
          <div>Hi Ben -<br>
          </div>
          <br>
          Thanks for your response. For the sake of argument, let's
          narrow the scope of the problem to eliminate some of the
          variables you have rightfully cited. <br>
          <br>
          Let's assume we're not dealing with volatiles, atomics, or FP
          operands. We'll even guarantee that the extra loaded value is
          never used. This is, in fact, the scenario that <a href="http://reviews.llvm.org/rL263446" target="_blank"></a><a href="http://reviews.llvm.org/rL263446" target="_blank">http://reviews.llvm.org/rL263446</a>
          is concerned with.<br>
          <br>
        </div>
        Related C example:<br>
        <br>
        typedef int v4i32 __attribute__((__vector_size__(16)));<br>
        <br>
        // Load some almost-consecutive ints as a vector.<br>
        v4i32 foo(int *x) {<br>
           int x0 = x[0];<br>
           int x1 = x[1];<br>
        // int x2 = x[2];   // U can't touch this? <br>
           int x3 = x[3];<br>
           return (v4i32) { x0, x1, 0, x3 };<br>
        }<br>
        <br>
        <div>
          <div>For x86, we notice that we have nearly a v4i32 vector's
            worth of loads, so we just turn that into a vector load and
            mask out the element that's getting set to zero:<br>
                movups    (%rdi), %xmm0            ; load 128-bits
            instead of three 32-bit elements<br>
                andps    LCPI0_0(%rip), %xmm0 ; put zero bits into the
            3rd element of the vector<br>
            <br>
          </div>
          <div>Should that optimization be disabled by a hypothetical
            -fextra-secure flag?<br>
          </div>
          <div><br>
            <br>
          </div>
        </div>
      </div>
      <div class="gmail_extra"><br>
        <div class="gmail_quote">On Wed, Mar 16, 2016 at 7:59 AM, Craig,
          Ben <span dir="ltr"><<a href="mailto:ben.craig@codeaurora.org" target="_blank">ben.craig@codeaurora.org</a>></span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div bgcolor="#FFFFFF" text="#000000"> Regarding accessing
              extra data, there are at least some limits as to what can
              be accessed.  You can't generate extra loads or stores to
              volatiles.  You can't generate extra stores to atomics,
              even if the extra stores appear to be the same value as
              the old value.<br>
              <br>
              As for determining where the perf vs. security line should
              be drawn, I would argue that most compilers have gone too
              far on the perf side while optimizing undefined behavior. 
              Dead store elimination leaving passwords in memory,
              integer overflow checks getting optimized out, and NULL
              checks optimized away.  Linus Torvalds was complaining
              about those just recently on this list, and while I don't
              share his tone, I agree with him regarding the harm these
              optimizations can cause.<br>
              <br>
              If I'm understanding correctly, for your specific cases,
              you are wondering if it is fine to load and operate on a
              floating point value that the user did not specifically
              request you to operate on.  This could cause (at least)
              two different problems.  First, it could cause a floating
              point exception.  I think the danger of the floating point
              exception should rule out loading values the user didn't
              request.  Second, loading values the user didn't specify
              could enable a timing attack.  The timing attack is scary,
              but I don't think it is something we can really fix in the
              general case.  As long as individual assembly instructions
              have impractical-to-predict execution times, we will be at
              the mercy of the current hardware state.  There are timing
              attacks that can determine TLS keys in a different VM
              instance based off of how quickly loads in the current
              process execute.  If our worst timing attack problems are
              floating point denormalization issues, then I think we are
              in a pretty good state.
              <div>
                <div><br>
                  <br>
                  <div>On 3/15/2016 10:46 AM, Sanjay Patel via llvm-dev
                    wrote:<br>
                  </div>
                </div>
              </div>
              <blockquote type="cite">
                <div>
                  <div>
                    <div dir="ltr">
                      <div>
                        <div>
                          <div>[cc'ing cfe-dev because this may require
                            some interpretation of language law]<br>
                            <br>
                            My understanding is that the compiler has
                            the freedom to access extra data in C/C++
                            (not sure about other languages); AFAIK, the
                            LLVM LangRef is silent about this. In C/C++,
                            this is based on the "as-if rule":<br>
                            <a href="http://en.cppreference.com/w/cpp/language/as_if" target="_blank">http://en.cppreference.com/w/cpp/language/as_if</a><br>
                          </div>
                        </div>
                        <br>
                      </div>
                      So the question is: where should the optimizer
                      draw the line with respect to perf vs. security if
                      it involves operating on unknown data? Are there
                      guidelines that we can use to decide this?<br>
                      <br>
                      <div>
                        <div>The masked load transform referenced below
                          is not unique in accessing / operating on
                          unknown data. In addition to the related
                          scalar loads -> vector load transform that
                          I've mentioned earlier in this thread, see for
                          example:<br>
                          <a href="https://llvm.org/bugs/show_bug.cgi?id=20358" target="_blank">https://llvm.org/bugs/show_bug.cgi?id=20358</a><br>
                          <div>(and the security paper and patch review
                            linked there)<br>
                          </div>
                          <br>
                        </div>
                      </div>
                      <div class="gmail_extra"><br>
                        <div class="gmail_quote">On Mon, Mar 14, 2016 at
                          10:26 PM, Shahid, Asghar-ahmad <span dir="ltr"><<a href="mailto:Asghar-ahmad.Shahid@amd.com" target="_blank">Asghar-ahmad.Shahid@amd.com</a>></span>
                          wrote:<br>
                          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                            <div link="blue" vlink="purple" lang="EN-US">
                              <div>
                                <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">Hi
                                    Sanjay,</span></p>
                                <span>
                                  <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"> </span></p>
                                  <p class="MsoNormal" style="margin-bottom:12pt"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">>The
                                      real question I have is whether it
                                      is legal to read the extra memory,
                                      regardless of whether this is a
                                      masked load or </span></p>
                                  <p class="MsoNormal" style="margin-bottom:12pt"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">>something
                                      else.</span></p>
                                </span>
                                <p class="MsoNormal" style="margin-bottom:12pt"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">No,
                                    It is not legal AFAIK because by
                                    doing that we are exposing the
                                    content of the memory which
                                    programmer</span></p>
                                <p class="MsoNormal" style="margin-bottom:12pt"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">does
                                    not intend to. This may be
                                    vulnerable for exploitation.</span></p>
                                <p class="MsoNormal" style="margin-bottom:12pt"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"> </span></p>
                                <p class="MsoNormal" style="margin-bottom:12pt"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">Regards,</span></p>
                                <p class="MsoNormal" style="margin-bottom:12pt"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">Shahid</span></p>
                                <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"> </span></p>
                                <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"> </span></p>
                                <div style="border-width:medium medium medium 1.5pt;border-style:none none none solid;border-color:-moz-use-text-color -moz-use-text-color -moz-use-text-color blue;padding:0in 0in 0in 4pt">
                                  <div>
                                    <div style="border-width:1pt medium medium;border-style:solid none none;border-color:rgb(181,196,223) -moz-use-text-color -moz-use-text-color;padding:3pt 0in 0in">
                                      <p class="MsoNormal"><b><span style="font-size:10pt;font-family:"Tahoma","sans-serif"">From:</span></b><span style="font-size:10pt;font-family:"Tahoma","sans-serif"">
                                          llvm-dev [mailto:<a href="mailto:llvm-dev-bounces@lists.llvm.org" target="_blank"></a><a href="mailto:llvm-dev-bounces@lists.llvm.org" target="_blank">llvm-dev-bounces@lists.llvm.org</a>]
                                          <b>On Behalf Of </b>Sanjay
                                          Patel via llvm-dev<br>
                                          <b>Sent:</b> Monday, March 14,
                                          2016 10:37 PM<br>
                                          <b>To:</b> Nema, Ashutosh<br>
                                          <b>Cc:</b> llvm-dev<br>
                                          <b>Subject:</b> Re: [llvm-dev]
                                          masked-load endpoints
                                          optimization</span></p>
                                    </div>
                                  </div>
                                  <div>
                                    <div>
                                      <p class="MsoNormal"> </p>
                                      <div>
                                        <p class="MsoNormal">I checked
                                          in a patch to do this
                                          transform for x86-only for
                                          now:<br>
                                          <a href="http://reviews.llvm.org/D18094" target="_blank">http://reviews.llvm.org/D18094</a>
                                          / <a href="http://reviews.llvm.org/rL263446" target="_blank">http://reviews.llvm.org/rL263446</a></p>
                                      </div>
                                      <div>
                                        <p class="MsoNormal"> </p>
                                        <div>
                                          <p class="MsoNormal">On Fri,
                                            Mar 11, 2016 at 9:57 AM,
                                            Sanjay Patel <<a href="mailto:spatel@rotateright.com" target="_blank"></a><a href="mailto:spatel@rotateright.com" target="_blank">spatel@rotateright.com</a>>
                                            wrote:</p>
                                          <div>
                                            <div>
                                              <div>
                                                <div>
                                                  <p class="MsoNormal" style="margin-bottom:12pt">Thanks,
                                                    Ashutosh.</p>
                                                </div>
                                                <p class="MsoNormal" style="margin-bottom:12pt">Yes,
                                                  either TTI or TLI
                                                  could be used to limit
                                                  the transform if we do
                                                  it in CGP rather than
                                                  the DAG.</p>
                                              </div>
                                              <p class="MsoNormal" style="margin-bottom:12pt">The
                                                real question I have is
                                                whether it is legal to
                                                read the extra memory,
                                                regardless of whether
                                                this is a masked load or
                                                something else.</p>
                                            </div>
                                            <p class="MsoNormal">Note
                                              that the x86 backend
                                              already does this, so
                                              either my proposal is ok
                                              for x86, or we're already
                                              doing an illegal
                                              optimization:</p>
                                            <div>
                                              <p class="MsoNormal" style="margin-bottom:12pt"><br>
                                                define <4 x i32>
                                                @load_bonus_bytes(i32*
                                                %addr1, <4 x i32>
                                                %v) {<br>
                                                  %ld1 = load i32, i32*
                                                %addr1<br>
                                                  %addr2 = getelementptr
                                                i32, i32* %addr1, i64 3<br>
                                                  %ld2 = load i32, i32*
                                                %addr2<br>
                                                  %vec1 = insertelement
                                                <4 x i32> undef,
                                                i32 %ld1, i32 0<br>
                                                  %vec2 = insertelement
                                                <4 x i32> %vec1,
                                                i32 %ld2, i32 3<br>
                                                  ret <4 x i32>
                                                %vec2<br>
                                                }<br>
                                                <br>
                                                $ ./llc -o -
                                                loadcombine.ll <br>
                                                ...<br>
                                                    movups    (%rdi),
                                                %xmm0<br>
                                                    retq<br>
                                                <br>
                                                <br>
                                              </p>
                                            </div>
                                          </div>
                                          <div>
                                            <div>
                                              <div>
                                                <p class="MsoNormal"> </p>
                                                <div>
                                                  <p class="MsoNormal">On
                                                    Thu, Mar 10, 2016 at
                                                    10:22 PM, Nema,
                                                    Ashutosh <<a href="mailto:Ashutosh.Nema@amd.com" target="_blank"></a><a href="mailto:Ashutosh.Nema@amd.com" target="_blank">Ashutosh.Nema@amd.com</a>>
                                                    wrote:</p>
                                                  <div>
                                                    <div>
                                                      <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">This
                                                          looks
                                                          interesting,
                                                          the main
                                                          motivation
                                                          appears to be
                                                          replacing
                                                          masked vector
                                                          load with a
                                                          general vector
                                                          load followed
                                                          by a select.</span></p>
                                                      <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"> </span></p>
                                                      <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">Observed
                                                          masked vector
                                                          loads are in
                                                          general
                                                          expensive in
                                                          comparison
                                                          with a vector
                                                          load.</span></p>
                                                      <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"> </span></p>
                                                      <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">But
                                                          if first &
                                                          last element
                                                          of a masked
                                                          vector load
                                                          are guaranteed
                                                          to be accessed
                                                          then it can be
                                                          transformed to
                                                          a vector load.</span></p>
                                                      <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"> </span></p>
                                                      <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">In
                                                          opt this can
                                                          be driven by
                                                          TTI, where the
                                                          benefit of
                                                          this
                                                          transformation
                                                          should be
                                                          checked.</span></p>
                                                      <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"> </span></p>
                                                      <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">Regards,</span></p>
                                                      <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">Ashutosh</span></p>
                                                      <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"> </span></p>
                                                      <p class="MsoNormal"><b><span style="font-size:11pt;font-family:"Calibri","sans-serif"">From:</span></b><span style="font-size:11pt;font-family:"Calibri","sans-serif"">
                                                          llvm-dev
                                                          [mailto:<a href="mailto:llvm-dev-bounces@lists.llvm.org" target="_blank"></a><a href="mailto:llvm-dev-bounces@lists.llvm.org" target="_blank">llvm-dev-bounces@lists.llvm.org</a>]
                                                          <b>On Behalf
                                                          Of </b>Sanjay
                                                          Patel via
                                                          llvm-dev<br>
                                                          <b>Sent:</b>
                                                          Friday, March
                                                          11, 2016 3:37
                                                          AM<br>
                                                          <b>To:</b>
                                                          llvm-dev<br>
                                                          <b>Subject:</b>
                                                          [llvm-dev]
                                                          masked-load
                                                          endpoints
                                                          optimization</span></p>
                                                      <div>
                                                        <div>
                                                          <p class="MsoNormal"> </p>
                                                          <div>
                                                          <div>
                                                          <div>
                                                          <p class="MsoNormal" style="margin-bottom:12pt">If we're loading the first and last elements
                                                          of a vector
                                                          using a masked
                                                          load [1], can
                                                          we replace the
                                                          masked load
                                                          with a full
                                                          vector load?<br>
                                                          <br>
                                                          "The result of
                                                          this operation
                                                          is equivalent
                                                          to a regular
                                                          vector load
                                                          instruction
                                                          followed by a
                                                          ‘select’
                                                          between the
                                                          loaded and the
                                                          passthru
                                                          values,
                                                          predicated on
                                                          the same mask.
                                                          However, using
                                                          this intrinsic
                                                          prevents
                                                          exceptions on
                                                          memory access
                                                          to masked-off
                                                          lanes."<br>
                                                          <br>
                                                          I think the
                                                          fact that
                                                          we're loading
                                                          the endpoints
                                                          of the vector
                                                          guarantees
                                                          that a full
                                                          vector load
                                                          can't have any
                                                          different
                                                          faulting/exception
                                                          behavior on
                                                          x86 and most
                                                          (?) other
                                                          targets. We
                                                          would,
                                                          however, be
                                                          reading memory
                                                          that the
                                                          program has
                                                          not explicitly
                                                          requested.</p>
                                                          </div>
                                                          <p class="MsoNormal">IR
                                                          example:<br>
                                                          <br>
                                                          define <4 x
                                                          i32>
                                                          @maskedload_endpoints(<4
                                                          x i32>*
                                                          %addr, <4 x
                                                          i32> %v) {</p>
                                                          </div>
                                                          <p class="MsoNormal"> 
                                                          ; load the
                                                          first and last
                                                          elements
                                                          pointed to by
                                                          %addr and
                                                          shuffle those
                                                          into %v</p>
                                                          <div>
                                                          <p class="MsoNormal" style="margin-bottom:12pt">  %res = call <4 x i32>
                                                          @llvm.masked.load.v4i32(<4
                                                          x i32>*
                                                          %addr, i32 4,
                                                          <4 x i1>
                                                          <i1 1, i1
                                                          0, i1 0, i1
                                                          1>, <4 x
                                                          i32> %v)<br>
                                                            ret <4 x
                                                          i32> %res<br>
                                                          }</p>
                                                          </div>
                                                          <div>
                                                          <p class="MsoNormal">would
                                                          become
                                                          something
                                                          like:</p>
                                                          </div>
                                                          <div>
                                                          <p class="MsoNormal"><br>
                                                          define <4 x
                                                          i32>
                                                          @maskedload_endpoints(<4
                                                          x i32>*
                                                          %addr, <4 x
                                                          i32> %v) {</p>
                                                          </div>
                                                          <div>
                                                          <p class="MsoNormal"> 
                                                          %vecload =
                                                          load <4 x
                                                          i32>, <4
                                                          x i32>*
                                                          %addr, align 4</p>
                                                          </div>
                                                          <div>
                                                          <p class="MsoNormal"> 
                                                          %sel = select
                                                          <4 x i1>
                                                          <i1 1, i1
                                                          0, i1 0, i1
                                                          1>, <4 x
                                                          i32>
                                                          %vecload,
                                                          <4 x
                                                          i32> %v</p>
                                                          </div>
                                                          <div>
                                                          <p class="MsoNormal" style="margin-bottom:12pt">  ret <4 x i32> %sel<br>
                                                          }</p>
                                                          </div>
                                                          <div>
                                                          <p class="MsoNormal">If
                                                          this isn't
                                                          valid as an IR
                                                          optimization,
                                                          would it be
                                                          acceptable as
                                                          a DAG combine
                                                          with target
                                                          hook to opt
                                                          in?</p>
                                                          </div>
                                                          <div>
                                                          <div>
                                                          <div>
                                                          <p class="MsoNormal"><br>
                                                          [1] <a href="http://llvm.org/docs/LangRef.html#llvm-masked-load-intrinsics" target="_blank">
                                                          </a><a href="http://llvm.org/docs/LangRef.html#llvm-masked-load-intrinsics" target="_blank"></a><a href="http://llvm.org/docs/LangRef.html#llvm-masked-load-intrinsics" target="_blank">http://llvm.org/docs/LangRef.html#llvm-masked-load-intrinsics</a></p>
                                                          </div>
                                                          </div>
                                                          </div>
                                                          </div>
                                                        </div>
                                                      </div>
                                                    </div>
                                                  </div>
                                                </div>
                                                <p class="MsoNormal"> </p>
                                              </div>
                                            </div>
                                          </div>
                                        </div>
                                        <p class="MsoNormal"> </p>
                                      </div>
                                    </div>
                                  </div>
                                </div>
                              </div>
                            </div>
                          </blockquote>
                        </div>
                        <br>
                      </div>
                    </div>
                    <br>
                    <fieldset></fieldset>
                    <br>
                  </div>
                </div>
                <pre>_______________________________________________
LLVM Developers mailing list
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><span><font color="#888888">
</font></span></pre>
                <span><font color="#888888"> </font></span></blockquote>
              <span><font color="#888888"> <br>
                  <pre cols="72">-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
</pre>
                </font></span></div>
          </blockquote>
        </div>
        <br>
      </div>
    </blockquote>
    <br>
    <pre cols="72">-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
</pre>
  </div></div></div>
</blockquote></div><br></div></div></div></div>