<div dir="ltr">We are careful not to try this optimization where it would extend the range of loaded memory; this is purely for what I call a "load doughnut". :)<br>Reading past either specified edge would be very bad because it could cause a memory fault / exception where there was none in the original program. That's definitely not legal.<br><div><div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Mar 16, 2016 at 12:20 PM, Craig, Ben <span dir="ltr"><<a href="mailto:ben.craig@codeaurora.org" target="_blank">ben.craig@codeaurora.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  <div bgcolor="#FFFFFF" text="#000000">

    I'm having a hard time finding any problems here, at least as long

    as the value is in the middle.  I wouldn't expect the contents of

    x[2] to affect the timing or power usage of anything.  I guess there

    would be  a minor "bad" side effect in that a memory read watchpoint

    would trigger with the 128 bit load that wouldn't be there with the

    32-bit loads.  I think it is semantically very similar to this

    situation as well...<br>

    <blockquote>v4i32 first_call(int *x) { //use all of the array<br>

         int f0 = x[0];<br>

         int f1 = x[1];<br>

         int f2 = x[2];<br>

         int f3 = x[3];<br>

         return (v4i32) { f0, f1, f2, f3 };<br>

      }<br>

      v4i32 second_call(int *x) { //use some of the array<br>

         int s0 = x[0];<br>

         int s1 = x[1];<br>

         int s2 = 0;<br>

         int s3 = x[3];<br>

         return (v4i32) { s0, s1, s2, s3 };<br>

      }<br>

      first_call(x);<br>

      second_call(x);<br>

    </blockquote>

    The implementation isn't going to zero out the stack in between

    those calls, so for a short period of time, the memory location of

    s2 will contain x[2].<br>

    <br>

    I'm less sure if the gaps are on the edges.  I'm worried that you

    might ending up crossing some important address boundary if you look

    at something earlier or later than what the user requested.<div><div class="h5"><br>

    <br>

    <div>On 3/16/2016 11:38 AM, Sanjay Patel

      wrote:<br>

    </div>

    <blockquote type="cite">

      <div dir="ltr">

        <div>

          <div>Hi Ben -<br>

          </div>

          <br>

          Thanks for your response. For the sake of argument, let's

          narrow the scope of the problem to eliminate some of the

          variables you have rightfully cited. <br>

          <br>

          Let's assume we're not dealing with volatiles, atomics, or FP

          operands. We'll even guarantee that the extra loaded value is

          never used. This is, in fact, the scenario that <a href="http://reviews.llvm.org/rL263446" target="_blank"></a><a href="http://reviews.llvm.org/rL263446" target="_blank">http://reviews.llvm.org/rL263446</a>

          is concerned with.<br>

          <br>

        </div>

        Related C example:<br>

        <br>

        typedef int v4i32 __attribute__((__vector_size__(16)));<br>

        <br>

        // Load some almost-consecutive ints as a vector.<br>

        v4i32 foo(int *x) {<br>

           int x0 = x[0];<br>

           int x1 = x[1];<br>

        // int x2 = x[2];   // U can't touch this? <br>

           int x3 = x[3];<br>

           return (v4i32) { x0, x1, 0, x3 };<br>

        }<br>

        <br>

        <div>

          <div>For x86, we notice that we have nearly a v4i32 vector's

            worth of loads, so we just turn that into a vector load and

            mask out the element that's getting set to zero:<br>

                movups    (%rdi), %xmm0            ; load 128-bits

            instead of three 32-bit elements<br>

                andps    LCPI0_0(%rip), %xmm0 ; put zero bits into the

            3rd element of the vector<br>

            <br>

          </div>

          <div>Should that optimization be disabled by a hypothetical

            -fextra-secure flag?<br>

          </div>

          <div><br>

            <br>

          </div>

        </div>

      </div>

      <div class="gmail_extra"><br>

        <div class="gmail_quote">On Wed, Mar 16, 2016 at 7:59 AM, Craig,

          Ben <span dir="ltr"><<a href="mailto:ben.craig@codeaurora.org" target="_blank">ben.craig@codeaurora.org</a>></span>

          wrote:<br>

          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

            <div bgcolor="#FFFFFF" text="#000000"> Regarding accessing

              extra data, there are at least some limits as to what can

              be accessed.  You can't generate extra loads or stores to

              volatiles.  You can't generate extra stores to atomics,

              even if the extra stores appear to be the same value as

              the old value.<br>

              <br>

              As for determining where the perf vs. security line should

              be drawn, I would argue that most compilers have gone too

              far on the perf side while optimizing undefined behavior. 

              Dead store elimination leaving passwords in memory,

              integer overflow checks getting optimized out, and NULL

              checks optimized away.  Linus Torvalds was complaining

              about those just recently on this list, and while I don't

              share his tone, I agree with him regarding the harm these

              optimizations can cause.<br>

              <br>

              If I'm understanding correctly, for your specific cases,

              you are wondering if it is fine to load and operate on a

              floating point value that the user did not specifically

              request you to operate on.  This could cause (at least)

              two different problems.  First, it could cause a floating

              point exception.  I think the danger of the floating point

              exception should rule out loading values the user didn't

              request.  Second, loading values the user didn't specify

              could enable a timing attack.  The timing attack is scary,

              but I don't think it is something we can really fix in the

              general case.  As long as individual assembly instructions

              have impractical-to-predict execution times, we will be at

              the mercy of the current hardware state.  There are timing

              attacks that can determine TLS keys in a different VM

              instance based off of how quickly loads in the current

              process execute.  If our worst timing attack problems are

              floating point denormalization issues, then I think we are

              in a pretty good state.

              <div>

                <div><br>

                  <br>

                  <div>On 3/15/2016 10:46 AM, Sanjay Patel via llvm-dev

                    wrote:<br>

                  </div>

                </div>

              </div>

              <blockquote type="cite">

                <div>

                  <div>

                    <div dir="ltr">

                      <div>

                        <div>

                          <div>[cc'ing cfe-dev because this may require

                            some interpretation of language law]<br>

                            <br>

                            My understanding is that the compiler has

                            the freedom to access extra data in C/C++

                            (not sure about other languages); AFAIK, the

                            LLVM LangRef is silent about this. In C/C++,

                            this is based on the "as-if rule":<br>

                            <a href="http://en.cppreference.com/w/cpp/language/as_if" target="_blank">http://en.cppreference.com/w/cpp/language/as_if</a><br>

                          </div>

                        </div>

                        <br>

                      </div>

                      So the question is: where should the optimizer

                      draw the line with respect to perf vs. security if

                      it involves operating on unknown data? Are there

                      guidelines that we can use to decide this?<br>

                      <br>

                      <div>

                        <div>The masked load transform referenced below

                          is not unique in accessing / operating on

                          unknown data. In addition to the related

                          scalar loads -> vector load transform that

                          I've mentioned earlier in this thread, see for

                          example:<br>

                          <a href="https://llvm.org/bugs/show_bug.cgi?id=20358" target="_blank">https://llvm.org/bugs/show_bug.cgi?id=20358</a><br>

                          <div>(and the security paper and patch review

                            linked there)<br>

                          </div>

                          <br>

                        </div>

                      </div>

                      <div class="gmail_extra"><br>

                        <div class="gmail_quote">On Mon, Mar 14, 2016 at

                          10:26 PM, Shahid, Asghar-ahmad <span dir="ltr"><<a href="mailto:Asghar-ahmad.Shahid@amd.com" target="_blank">Asghar-ahmad.Shahid@amd.com</a>></span>

                          wrote:<br>

                          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                            <div link="blue" vlink="purple" lang="EN-US">

                              <div>

                                <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">Hi

                                    Sanjay,</span></p>

                                <span>

                                  <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"> </span></p>

                                  <p class="MsoNormal" style="margin-bottom:12pt"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">>The

                                      real question I have is whether it

                                      is legal to read the extra memory,

                                      regardless of whether this is a

                                      masked load or </span></p>

                                  <p class="MsoNormal" style="margin-bottom:12pt"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">>something

                                      else.</span></p>

                                </span>

                                <p class="MsoNormal" style="margin-bottom:12pt"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">No,

                                    It is not legal AFAIK because by

                                    doing that we are exposing the

                                    content of the memory which

                                    programmer</span></p>

                                <p class="MsoNormal" style="margin-bottom:12pt"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">does

                                    not intend to. This may be

                                    vulnerable for exploitation.</span></p>

                                <p class="MsoNormal" style="margin-bottom:12pt"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"> </span></p>

                                <p class="MsoNormal" style="margin-bottom:12pt"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">Regards,</span></p>

                                <p class="MsoNormal" style="margin-bottom:12pt"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">Shahid</span></p>

                                <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"> </span></p>

                                <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"> </span></p>

                                <div style="border-width:medium medium medium 1.5pt;border-style:none none none solid;border-color:-moz-use-text-color -moz-use-text-color -moz-use-text-color blue;padding:0in 0in 0in 4pt">

                                  <div>

                                    <div style="border-width:1pt medium medium;border-style:solid none none;border-color:rgb(181,196,223) -moz-use-text-color -moz-use-text-color;padding:3pt 0in 0in">

                                      <p class="MsoNormal"><b><span style="font-size:10pt;font-family:"Tahoma","sans-serif"">From:</span></b><span style="font-size:10pt;font-family:"Tahoma","sans-serif"">

                                          llvm-dev [mailto:<a href="mailto:llvm-dev-bounces@lists.llvm.org" target="_blank"></a><a href="mailto:llvm-dev-bounces@lists.llvm.org" target="_blank">llvm-dev-bounces@lists.llvm.org</a>]

                                          <b>On Behalf Of </b>Sanjay

                                          Patel via llvm-dev<br>

                                          <b>Sent:</b> Monday, March 14,

                                          2016 10:37 PM<br>

                                          <b>To:</b> Nema, Ashutosh<br>

                                          <b>Cc:</b> llvm-dev<br>

                                          <b>Subject:</b> Re: [llvm-dev]

                                          masked-load endpoints

                                          optimization</span></p>

                                    </div>

                                  </div>

                                  <div>

                                    <div>

                                      <p class="MsoNormal"> </p>

                                      <div>

                                        <p class="MsoNormal">I checked

                                          in a patch to do this

                                          transform for x86-only for

                                          now:<br>

                                          <a href="http://reviews.llvm.org/D18094" target="_blank">http://reviews.llvm.org/D18094</a>

                                          / <a href="http://reviews.llvm.org/rL263446" target="_blank">http://reviews.llvm.org/rL263446</a></p>

                                      </div>

                                      <div>

                                        <p class="MsoNormal"> </p>

                                        <div>

                                          <p class="MsoNormal">On Fri,

                                            Mar 11, 2016 at 9:57 AM,

                                            Sanjay Patel <<a href="mailto:spatel@rotateright.com" target="_blank"></a><a href="mailto:spatel@rotateright.com" target="_blank">spatel@rotateright.com</a>>

                                            wrote:</p>

                                          <div>

                                            <div>

                                              <div>

                                                <div>

                                                  <p class="MsoNormal" style="margin-bottom:12pt">Thanks,

                                                    Ashutosh.</p>

                                                </div>

                                                <p class="MsoNormal" style="margin-bottom:12pt">Yes,

                                                  either TTI or TLI

                                                  could be used to limit

                                                  the transform if we do

                                                  it in CGP rather than

                                                  the DAG.</p>

                                              </div>

                                              <p class="MsoNormal" style="margin-bottom:12pt">The

                                                real question I have is

                                                whether it is legal to

                                                read the extra memory,

                                                regardless of whether

                                                this is a masked load or

                                                something else.</p>

                                            </div>

                                            <p class="MsoNormal">Note

                                              that the x86 backend

                                              already does this, so

                                              either my proposal is ok

                                              for x86, or we're already

                                              doing an illegal

                                              optimization:</p>

                                            <div>

                                              <p class="MsoNormal" style="margin-bottom:12pt"><br>

                                                define <4 x i32>

                                                @load_bonus_bytes(i32*

                                                %addr1, <4 x i32>

                                                %v) {<br>

                                                  %ld1 = load i32, i32*

                                                %addr1<br>

                                                  %addr2 = getelementptr

                                                i32, i32* %addr1, i64 3<br>

                                                  %ld2 = load i32, i32*

                                                %addr2<br>

                                                  %vec1 = insertelement

                                                <4 x i32> undef,

                                                i32 %ld1, i32 0<br>

                                                  %vec2 = insertelement

                                                <4 x i32> %vec1,

                                                i32 %ld2, i32 3<br>

                                                  ret <4 x i32>

                                                %vec2<br>

                                                }<br>

                                                <br>

                                                $ ./llc -o -

                                                loadcombine.ll <br>

                                                ...<br>

                                                    movups    (%rdi),

                                                %xmm0<br>

                                                    retq<br>

                                                <br>

                                                <br>

                                              </p>

                                            </div>

                                          </div>

                                          <div>

                                            <div>

                                              <div>

                                                <p class="MsoNormal"> </p>

                                                <div>

                                                  <p class="MsoNormal">On

                                                    Thu, Mar 10, 2016 at

                                                    10:22 PM, Nema,

                                                    Ashutosh <<a href="mailto:Ashutosh.Nema@amd.com" target="_blank"></a><a href="mailto:Ashutosh.Nema@amd.com" target="_blank">Ashutosh.Nema@amd.com</a>>

                                                    wrote:</p>

                                                  <div>

                                                    <div>

                                                      <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">This

                                                          looks

                                                          interesting,

                                                          the main

                                                          motivation

                                                          appears to be

                                                          replacing

                                                          masked vector

                                                          load with a

                                                          general vector

                                                          load followed

                                                          by a select.</span></p>

                                                      <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"> </span></p>

                                                      <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">Observed

                                                          masked vector

                                                          loads are in

                                                          general

                                                          expensive in

                                                          comparison

                                                          with a vector

                                                          load.</span></p>

                                                      <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"> </span></p>

                                                      <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">But

                                                          if first &

                                                          last element

                                                          of a masked

                                                          vector load

                                                          are guaranteed

                                                          to be accessed

                                                          then it can be

                                                          transformed to

                                                          a vector load.</span></p>

                                                      <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"> </span></p>

                                                      <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">In

                                                          opt this can

                                                          be driven by

                                                          TTI, where the

                                                          benefit of

                                                          this

                                                          transformation

                                                          should be

                                                          checked.</span></p>

                                                      <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"> </span></p>

                                                      <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">Regards,</span></p>

                                                      <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">Ashutosh</span></p>

                                                      <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"> </span></p>

                                                      <p class="MsoNormal"><b><span style="font-size:11pt;font-family:"Calibri","sans-serif"">From:</span></b><span style="font-size:11pt;font-family:"Calibri","sans-serif"">

                                                          llvm-dev

                                                          [mailto:<a href="mailto:llvm-dev-bounces@lists.llvm.org" target="_blank"></a><a href="mailto:llvm-dev-bounces@lists.llvm.org" target="_blank">llvm-dev-bounces@lists.llvm.org</a>]

                                                          <b>On Behalf

                                                          Of </b>Sanjay

                                                          Patel via

                                                          llvm-dev<br>

                                                          <b>Sent:</b>

                                                          Friday, March

                                                          11, 2016 3:37

                                                          AM<br>

                                                          <b>To:</b>

                                                          llvm-dev<br>

                                                          <b>Subject:</b>

                                                          [llvm-dev]

                                                          masked-load

                                                          endpoints

                                                          optimization</span></p>

                                                      <div>

                                                        <div>

                                                          <p class="MsoNormal"> </p>

                                                          <div>

                                                          <div>

                                                          <div>

                                                          <p class="MsoNormal" style="margin-bottom:12pt">If we're loading the first and last elements

                                                          of a vector

                                                          using a masked

                                                          load [1], can

                                                          we replace the

                                                          masked load

                                                          with a full

                                                          vector load?<br>

                                                          <br>

                                                          "The result of

                                                          this operation

                                                          is equivalent

                                                          to a regular

                                                          vector load

                                                          instruction

                                                          followed by a

                                                          ‘select’

                                                          between the

                                                          loaded and the

                                                          passthru

                                                          values,

                                                          predicated on

                                                          the same mask.

                                                          However, using

                                                          this intrinsic

                                                          prevents

                                                          exceptions on

                                                          memory access

                                                          to masked-off

                                                          lanes."<br>

                                                          <br>

                                                          I think the

                                                          fact that

                                                          we're loading

                                                          the endpoints

                                                          of the vector

                                                          guarantees

                                                          that a full

                                                          vector load

                                                          can't have any

                                                          different

                                                          faulting/exception

                                                          behavior on

                                                          x86 and most

                                                          (?) other

                                                          targets. We

                                                          would,

                                                          however, be

                                                          reading memory

                                                          that the

                                                          program has

                                                          not explicitly

                                                          requested.</p>

                                                          </div>

                                                          <p class="MsoNormal">IR

                                                          example:<br>

                                                          <br>

                                                          define <4 x

                                                          i32>

                                                          @maskedload_endpoints(<4

                                                          x i32>*

                                                          %addr, <4 x

                                                          i32> %v) {</p>

                                                          </div>

                                                          <p class="MsoNormal"> 

                                                          ; load the

                                                          first and last

                                                          elements

                                                          pointed to by

                                                          %addr and

                                                          shuffle those

                                                          into %v</p>

                                                          <div>

                                                          <p class="MsoNormal" style="margin-bottom:12pt">  %res = call <4 x i32>

                                                          @llvm.masked.load.v4i32(<4

                                                          x i32>*

                                                          %addr, i32 4,

                                                          <4 x i1>

                                                          <i1 1, i1

                                                          0, i1 0, i1

                                                          1>, <4 x

                                                          i32> %v)<br>

                                                            ret <4 x

                                                          i32> %res<br>

                                                          }</p>

                                                          </div>

                                                          <div>

                                                          <p class="MsoNormal">would

                                                          become

                                                          something

                                                          like:</p>

                                                          </div>

                                                          <div>

                                                          <p class="MsoNormal"><br>

                                                          define <4 x

                                                          i32>

                                                          @maskedload_endpoints(<4

                                                          x i32>*

                                                          %addr, <4 x

                                                          i32> %v) {</p>

                                                          </div>

                                                          <div>

                                                          <p class="MsoNormal"> 

                                                          %vecload =

                                                          load <4 x

                                                          i32>, <4

                                                          x i32>*

                                                          %addr, align 4</p>

                                                          </div>

                                                          <div>

                                                          <p class="MsoNormal"> 

                                                          %sel = select

                                                          <4 x i1>

                                                          <i1 1, i1

                                                          0, i1 0, i1

                                                          1>, <4 x

                                                          i32>

                                                          %vecload,

                                                          <4 x

                                                          i32> %v</p>

                                                          </div>

                                                          <div>

                                                          <p class="MsoNormal" style="margin-bottom:12pt">  ret <4 x i32> %sel<br>

                                                          }</p>

                                                          </div>

                                                          <div>

                                                          <p class="MsoNormal">If

                                                          this isn't

                                                          valid as an IR

                                                          optimization,

                                                          would it be

                                                          acceptable as

                                                          a DAG combine

                                                          with target

                                                          hook to opt

                                                          in?</p>

                                                          </div>

                                                          <div>

                                                          <div>

                                                          <div>

                                                          <p class="MsoNormal"><br>

                                                          [1] <a href="http://llvm.org/docs/LangRef.html#llvm-masked-load-intrinsics" target="_blank">

                                                          </a><a href="http://llvm.org/docs/LangRef.html#llvm-masked-load-intrinsics" target="_blank"></a><a href="http://llvm.org/docs/LangRef.html#llvm-masked-load-intrinsics" target="_blank">http://llvm.org/docs/LangRef.html#llvm-masked-load-intrinsics</a></p>

                                                          </div>

                                                          </div>

                                                          </div>

                                                          </div>

                                                        </div>

                                                      </div>

                                                    </div>

                                                  </div>

                                                </div>

                                                <p class="MsoNormal"> </p>

                                              </div>

                                            </div>

                                          </div>

                                        </div>

                                        <p class="MsoNormal"> </p>

                                      </div>

                                    </div>

                                  </div>

                                </div>

                              </div>

                            </div>

                          </blockquote>

                        </div>

                        <br>

                      </div>

                    </div>

                    <br>

                    <fieldset></fieldset>

                    <br>

                  </div>

                </div>

                <pre>_______________________________________________

LLVM Developers mailing list

<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>

<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><span><font color="#888888">

</font></span></pre>

                <span><font color="#888888"> </font></span></blockquote>

              <span><font color="#888888"> <br>

                  <pre cols="72">-- 

Employee of Qualcomm Innovation Center, Inc.

Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

</pre>

                </font></span></div>

          </blockquote>

        </div>

        <br>

      </div>

    </blockquote>

    <br>

    <pre cols="72">-- 

Employee of Qualcomm Innovation Center, Inc.

Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

</pre>

  </div></div></div>

</blockquote></div><br></div></div></div></div>