<div dir="ltr"><div><div>Hi Ben -<br></div><br>Thanks for your response. For the sake of argument, let's narrow the scope of the problem to eliminate some of the variables you have rightfully cited. <br><br>Let's assume we're not dealing with volatiles, atomics, or FP operands. We'll even guarantee that the extra loaded value is never used. This is, in fact, the scenario that <a href="http://reviews.llvm.org/rL263446" target="_blank">http://reviews.llvm.org/rL263446</a> is concerned with.<br><br></div>Related C example:<br><br>typedef int v4i32 __attribute__((__vector_size__(16)));<br><br>// Load some almost-consecutive ints as a vector.<br>v4i32 foo(int *x) {<br>   int x0 = x[0];<br>   int x1 = x[1];<br>// int x2 = x[2];   // U can't touch this? <br>   int x3 = x[3];<br>   return (v4i32) { x0, x1, 0, x3 };<br>}<br><br><div><div>For x86, we notice that we have nearly a v4i32 vector's worth of loads, so we just turn that into a vector load and mask out the element that's getting set to zero:<br>    movups    (%rdi), %xmm0            ; load 128-bits instead of three 32-bit elements<br>    andps    LCPI0_0(%rip), %xmm0 ; put zero bits into the 3rd element of the vector<br><br></div><div>Should that optimization be disabled by a hypothetical -fextra-secure flag?<br></div><div><br><br></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Mar 16, 2016 at 7:59 AM, Craig, Ben <span dir="ltr"><<a href="mailto:ben.craig@codeaurora.org" target="_blank">ben.craig@codeaurora.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  <div bgcolor="#FFFFFF" text="#000000">

    Regarding accessing extra data, there are at least some limits as to

    what can be accessed.  You can't generate extra loads or stores to

    volatiles.  You can't generate extra stores to atomics, even if the

    extra stores appear to be the same value as the old value.<br>

    <br>

    As for determining where the perf vs. security line should be drawn,

    I would argue that most compilers have gone too far on the perf side

    while optimizing undefined behavior.  Dead store elimination leaving

    passwords in memory, integer overflow checks getting optimized out,

    and NULL checks optimized away.  Linus Torvalds was complaining

    about those just recently on this list, and while I don't share his

    tone, I agree with him regarding the harm these optimizations can

    cause.<br>

    <br>

    If I'm understanding correctly, for your specific cases, you are

    wondering if it is fine to load and operate on a floating point

    value that the user did not specifically request you to operate on. 

    This could cause (at least) two different problems.  First, it could

    cause a floating point exception.  I think the danger of the

    floating point exception should rule out loading values the user

    didn't request.  Second, loading values the user didn't specify

    could enable a timing attack.  The timing attack is scary, but I

    don't think it is something we can really fix in the general case. 

    As long as individual assembly instructions have

    impractical-to-predict execution times, we will be at the mercy of

    the current hardware state.  There are timing attacks that can

    determine TLS keys in a different VM instance based off of how

    quickly loads in the current process execute.  If our worst timing

    attack problems are floating point denormalization issues, then I

    think we are in a pretty good state.<div><div class="h5"><br>

    <br>

    <div>On 3/15/2016 10:46 AM, Sanjay Patel via

      llvm-dev wrote:<br>

    </div>

    </div></div><blockquote type="cite"><div><div class="h5">

      <div dir="ltr">

        <div>

          <div>

            <div>[cc'ing cfe-dev because this may require some

              interpretation of language law]<br>

              <br>

              My understanding is that the compiler has the freedom to

              access extra data in C/C++ (not sure about other

              languages); AFAIK, the LLVM LangRef is silent about this.

              In C/C++, this is based on the "as-if rule":<br>

              <a href="http://en.cppreference.com/w/cpp/language/as_if" target="_blank">http://en.cppreference.com/w/cpp/language/as_if</a><br>

            </div>

          </div>

          <br>

        </div>

        So the question is: where should the optimizer draw the line

        with respect to perf vs. security if it involves operating on

        unknown data? Are there guidelines that we can use to decide

        this?<br>

        <br>

        <div>

          <div>The masked load transform referenced below is not unique

            in accessing / operating on unknown data. In addition to the

            related scalar loads -> vector load transform that I've

            mentioned earlier in this thread, see for example:<br>

            <a href="https://llvm.org/bugs/show_bug.cgi?id=20358" target="_blank">https://llvm.org/bugs/show_bug.cgi?id=20358</a><br>

            <div>(and the security paper and patch review linked there)<br>

            </div>

            <br>

          </div>

        </div>

        <div class="gmail_extra"><br>

          <div class="gmail_quote">On Mon, Mar 14, 2016 at 10:26 PM,

            Shahid, Asghar-ahmad <span dir="ltr"><<a href="mailto:Asghar-ahmad.Shahid@amd.com" target="_blank"></a><a href="mailto:Asghar-ahmad.Shahid@amd.com" target="_blank">Asghar-ahmad.Shahid@amd.com</a>></span>

            wrote:<br>

            <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

              <div link="blue" vlink="purple" lang="EN-US">

                <div>

                  <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">Hi

                      Sanjay,</span></p>

                  <span>

                    <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"> </span></p>

                    <p class="MsoNormal" style="margin-bottom:12pt"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">>The

                        real question I have is whether it is legal to

                        read the extra memory, regardless of whether

                        this is a masked load or

                      </span></p>

                    <p class="MsoNormal" style="margin-bottom:12pt"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">>something

                        else.</span></p>

                  </span>

                  <p class="MsoNormal" style="margin-bottom:12pt"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">No,

                      It is not legal AFAIK because by doing that we are

                      exposing the content of the memory which

                      programmer</span></p>

                  <p class="MsoNormal" style="margin-bottom:12pt"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">does

                      not intend to. This may be vulnerable for

                      exploitation.</span></p>

                  <p class="MsoNormal" style="margin-bottom:12pt"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"> </span></p>

                  <p class="MsoNormal" style="margin-bottom:12pt"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">Regards,</span></p>

                  <p class="MsoNormal" style="margin-bottom:12pt"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">Shahid</span></p>

                  <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"> </span></p>

                  <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"> </span></p>

                  <div style="border-width:medium medium medium 1.5pt;border-style:none none none solid;border-color:-moz-use-text-color -moz-use-text-color -moz-use-text-color blue;padding:0in 0in 0in 4pt">

                    <div>

                      <div style="border-width:1pt medium medium;border-style:solid none none;border-color:rgb(181,196,223) -moz-use-text-color -moz-use-text-color;padding:3pt 0in 0in">

                        <p class="MsoNormal"><b><span style="font-size:10pt;font-family:"Tahoma","sans-serif"">From:</span></b><span style="font-size:10pt;font-family:"Tahoma","sans-serif"">

                            llvm-dev [mailto:<a href="mailto:llvm-dev-bounces@lists.llvm.org" target="_blank">llvm-dev-bounces@lists.llvm.org</a>]

                            <b>On Behalf Of </b>Sanjay Patel via

                            llvm-dev<br>

                            <b>Sent:</b> Monday, March 14, 2016 10:37 PM<br>

                            <b>To:</b> Nema, Ashutosh<br>

                            <b>Cc:</b> llvm-dev<br>

                            <b>Subject:</b> Re: [llvm-dev] masked-load

                            endpoints optimization</span></p>

                      </div>

                    </div>

                    <div>

                      <div>

                        <p class="MsoNormal"> </p>

                        <div>

                          <p class="MsoNormal">I checked in a patch to

                            do this transform for x86-only for now:<br>

                            <a href="http://reviews.llvm.org/D18094" target="_blank">http://reviews.llvm.org/D18094</a>

                            /

                            <a href="http://reviews.llvm.org/rL263446" target="_blank">http://reviews.llvm.org/rL263446</a></p>

                        </div>

                        <div>

                          <p class="MsoNormal"> </p>

                          <div>

                            <p class="MsoNormal">On Fri, Mar 11, 2016 at

                              9:57 AM, Sanjay Patel <<a href="mailto:spatel@rotateright.com" target="_blank"></a><a href="mailto:spatel@rotateright.com" target="_blank">spatel@rotateright.com</a>>

                              wrote:</p>

                            <div>

                              <div>

                                <div>

                                  <div>

                                    <p class="MsoNormal" style="margin-bottom:12pt">Thanks,

                                      Ashutosh.</p>

                                  </div>

                                  <p class="MsoNormal" style="margin-bottom:12pt">Yes,

                                    either TTI or TLI could be used to

                                    limit the transform if we do it in

                                    CGP rather than the DAG.</p>

                                </div>

                                <p class="MsoNormal" style="margin-bottom:12pt">The real

                                  question I have is whether it is legal

                                  to read the extra memory, regardless

                                  of whether this is a masked load or

                                  something else.</p>

                              </div>

                              <p class="MsoNormal">Note that the x86

                                backend already does this, so either my

                                proposal is ok for x86, or we're already

                                doing an illegal optimization:</p>

                              <div>

                                <p class="MsoNormal" style="margin-bottom:12pt"><br>

                                  define <4 x i32>

                                  @load_bonus_bytes(i32* %addr1, <4 x

                                  i32> %v) {<br>

                                    %ld1 = load i32, i32* %addr1<br>

                                    %addr2 = getelementptr i32, i32*

                                  %addr1, i64 3<br>

                                    %ld2 = load i32, i32* %addr2<br>

                                    %vec1 = insertelement <4 x

                                  i32> undef, i32 %ld1, i32 0<br>

                                    %vec2 = insertelement <4 x

                                  i32> %vec1, i32 %ld2, i32 3<br>

                                    ret <4 x i32> %vec2<br>

                                  }<br>

                                  <br>

                                  $ ./llc -o - loadcombine.ll <br>

                                  ...<br>

                                      movups    (%rdi), %xmm0<br>

                                      retq<br>

                                  <br>

                                  <br>

                                </p>

                              </div>

                            </div>

                            <div>

                              <div>

                                <div>

                                  <p class="MsoNormal"> </p>

                                  <div>

                                    <p class="MsoNormal">On Thu, Mar 10,

                                      2016 at 10:22 PM, Nema, Ashutosh

                                      <<a href="mailto:Ashutosh.Nema@amd.com" target="_blank">Ashutosh.Nema@amd.com</a>>

                                      wrote:</p>

                                    <div>

                                      <div>

                                        <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">This

                                            looks interesting, the main

                                            motivation appears to be

                                            replacing masked vector load

                                            with a general vector load

                                            followed by a select.</span></p>

                                        <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"> </span></p>

                                        <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">Observed

                                            masked vector loads are in

                                            general expensive in

                                            comparison with a vector

                                            load.</span></p>

                                        <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"> </span></p>

                                        <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">But

                                            if first & last element

                                            of a masked vector load are

                                            guaranteed to be accessed

                                            then it can be transformed

                                            to a vector load.</span></p>

                                        <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"> </span></p>

                                        <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">In

                                            opt this can be driven by

                                            TTI, where the benefit of

                                            this transformation should

                                            be checked.</span></p>

                                        <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"> </span></p>

                                        <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">Regards,</span></p>

                                        <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">Ashutosh</span></p>

                                        <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"> </span></p>

                                        <p class="MsoNormal"><b><span style="font-size:11pt;font-family:"Calibri","sans-serif"">From:</span></b><span style="font-size:11pt;font-family:"Calibri","sans-serif"">

                                            llvm-dev [mailto:<a href="mailto:llvm-dev-bounces@lists.llvm.org" target="_blank"></a><a href="mailto:llvm-dev-bounces@lists.llvm.org" target="_blank">llvm-dev-bounces@lists.llvm.org</a>]

                                            <b>On Behalf Of </b>Sanjay

                                            Patel via llvm-dev<br>

                                            <b>Sent:</b> Friday, March

                                            11, 2016 3:37 AM<br>

                                            <b>To:</b> llvm-dev<br>

                                            <b>Subject:</b> [llvm-dev]

                                            masked-load endpoints

                                            optimization</span></p>

                                        <div>

                                          <div>

                                            <p class="MsoNormal"> </p>

                                            <div>

                                              <div>

                                                <div>

                                                  <p class="MsoNormal" style="margin-bottom:12pt">If

                                                    we're loading the

                                                    first and last

                                                    elements of a vector

                                                    using a masked load

                                                    [1], can we replace

                                                    the masked load with

                                                    a full vector load?<br>

                                                    <br>

                                                    "The result of this

                                                    operation is

                                                    equivalent to a

                                                    regular vector load

                                                    instruction followed

                                                    by a ‘select’

                                                    between the loaded

                                                    and the passthru

                                                    values, predicated

                                                    on the same mask.

                                                    However, using this

                                                    intrinsic prevents

                                                    exceptions on memory

                                                    access to masked-off

                                                    lanes."<br>

                                                    <br>

                                                    I think the fact

                                                    that we're loading

                                                    the endpoints of the

                                                    vector guarantees

                                                    that a full vector

                                                    load can't have any

                                                    different

                                                    faulting/exception

                                                    behavior on x86 and

                                                    most (?) other

                                                    targets. We would,

                                                    however, be reading

                                                    memory that the

                                                    program has not

                                                    explicitly

                                                    requested.</p>

                                                </div>

                                                <p class="MsoNormal">IR

                                                  example:<br>

                                                  <br>

                                                  define <4 x i32>

                                                  @maskedload_endpoints(<4

                                                  x i32>* %addr,

                                                  <4 x i32> %v) {</p>

                                              </div>

                                              <p class="MsoNormal">  ;

                                                load the first and last

                                                elements pointed to by

                                                %addr and shuffle those

                                                into %v</p>

                                              <div>

                                                <p class="MsoNormal" style="margin-bottom:12pt"> 

                                                  %res = call <4 x

                                                  i32>

                                                  @llvm.masked.load.v4i32(<4

                                                  x i32>* %addr, i32

                                                  4, <4 x i1>

                                                  <i1 1, i1 0, i1 0,

                                                  i1 1>, <4 x

                                                  i32> %v)<br>

                                                    ret <4 x i32>

                                                  %res<br>

                                                  }</p>

                                              </div>

                                              <div>

                                                <p class="MsoNormal">would

                                                  become something like:</p>

                                              </div>

                                              <div>

                                                <p class="MsoNormal"><br>

                                                  define <4 x i32>

                                                  @maskedload_endpoints(<4

                                                  x i32>* %addr,

                                                  <4 x i32> %v) {</p>

                                              </div>

                                              <div>

                                                <p class="MsoNormal"> 

                                                  %vecload = load <4

                                                  x i32>, <4 x

                                                  i32>* %addr, align

                                                  4</p>

                                              </div>

                                              <div>

                                                <p class="MsoNormal"> 

                                                  %sel = select <4 x

                                                  i1> <i1 1, i1 0,

                                                  i1 0, i1 1>, <4

                                                  x i32> %vecload,

                                                  <4 x i32> %v</p>

                                              </div>

                                              <div>

                                                <p class="MsoNormal" style="margin-bottom:12pt"> 

                                                  ret <4 x i32>

                                                  %sel<br>

                                                  }</p>

                                              </div>

                                              <div>

                                                <p class="MsoNormal">If

                                                  this isn't valid as an

                                                  IR optimization, would

                                                  it be acceptable as a

                                                  DAG combine with

                                                  target hook to opt in?</p>

                                              </div>

                                              <div>

                                                <div>

                                                  <div>

                                                    <p class="MsoNormal"><br>

                                                      [1] <a href="http://llvm.org/docs/LangRef.html#llvm-masked-load-intrinsics" target="_blank">

</a><a href="http://llvm.org/docs/LangRef.html#llvm-masked-load-intrinsics" target="_blank">http://llvm.org/docs/LangRef.html#llvm-masked-load-intrinsics</a></p>

                                                  </div>

                                                </div>

                                              </div>

                                            </div>

                                          </div>

                                        </div>

                                      </div>

                                    </div>

                                  </div>

                                  <p class="MsoNormal"> </p>

                                </div>

                              </div>

                            </div>

                          </div>

                          <p class="MsoNormal"> </p>

                        </div>

                      </div>

                    </div>

                  </div>

                </div>

              </div>

            </blockquote>

          </div>

          <br>

        </div>

      </div>

      <br>

      <fieldset></fieldset>

      <br>

      </div></div><pre>_______________________________________________

LLVM Developers mailing list

<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>

<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><span class="HOEnZb"><font color="#888888">

</font></span></pre><span class="HOEnZb"><font color="#888888">

    </font></span></blockquote><span class="HOEnZb"><font color="#888888">

    <br>

    <pre cols="72">-- 

Employee of Qualcomm Innovation Center, Inc.

Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

</pre>

  </font></span></div>

</blockquote></div><br></div>