<div dir="ltr"><div><div>Hi Ben -<br></div><br>Thanks for your response. For the sake of argument, let's narrow the scope of the problem to eliminate some of the variables you have rightfully cited. <br><br>Let's assume we're not dealing with volatiles, atomics, or FP operands. We'll even guarantee that the extra loaded value is never used. This is, in fact, the scenario that <a href="http://reviews.llvm.org/rL263446" target="_blank">http://reviews.llvm.org/rL263446</a> is concerned with.<br><br></div>Related C example:<br><br>typedef int v4i32 __attribute__((__vector_size__(16)));<br><br>// Load some almost-consecutive ints as a vector.<br>v4i32 foo(int *x) {<br>   int x0 = x[0];<br>   int x1 = x[1];<br>// int x2 = x[2];   // U can't touch this? <br>   int x3 = x[3];<br>   return (v4i32) { x0, x1, 0, x3 };<br>}<br><br><div><div>For x86, we notice that we have nearly a v4i32 vector's worth of loads, so we just turn that into a vector load and mask out the element that's getting set to zero:<br>    movups    (%rdi), %xmm0            ; load 128-bits instead of three 32-bit elements<br>    andps    LCPI0_0(%rip), %xmm0 ; put zero bits into the 3rd element of the vector<br><br></div><div>Should that optimization be disabled by a hypothetical -fextra-secure flag?<br></div><div><br><br></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Mar 16, 2016 at 7:59 AM, Craig, Ben <span dir="ltr"><<a href="mailto:ben.craig@codeaurora.org" target="_blank">ben.craig@codeaurora.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
  
    
  
  <div bgcolor="#FFFFFF" text="#000000">
    Regarding accessing extra data, there are at least some limits as to
    what can be accessed.  You can't generate extra loads or stores to
    volatiles.  You can't generate extra stores to atomics, even if the
    extra stores appear to be the same value as the old value.<br>
    <br>
    As for determining where the perf vs. security line should be drawn,
    I would argue that most compilers have gone too far on the perf side
    while optimizing undefined behavior.  Dead store elimination leaving
    passwords in memory, integer overflow checks getting optimized out,
    and NULL checks optimized away.  Linus Torvalds was complaining
    about those just recently on this list, and while I don't share his
    tone, I agree with him regarding the harm these optimizations can
    cause.<br>
    <br>
    If I'm understanding correctly, for your specific cases, you are
    wondering if it is fine to load and operate on a floating point
    value that the user did not specifically request you to operate on. 
    This could cause (at least) two different problems.  First, it could
    cause a floating point exception.  I think the danger of the
    floating point exception should rule out loading values the user
    didn't request.  Second, loading values the user didn't specify
    could enable a timing attack.  The timing attack is scary, but I
    don't think it is something we can really fix in the general case. 
    As long as individual assembly instructions have
    impractical-to-predict execution times, we will be at the mercy of
    the current hardware state.  There are timing attacks that can
    determine TLS keys in a different VM instance based off of how
    quickly loads in the current process execute.  If our worst timing
    attack problems are floating point denormalization issues, then I
    think we are in a pretty good state.<div><div class="h5"><br>
    <br>
    <div>On 3/15/2016 10:46 AM, Sanjay Patel via
      llvm-dev wrote:<br>
    </div>
    </div></div><blockquote type="cite"><div><div class="h5">
      <div dir="ltr">
        <div>
          <div>
            <div>[cc'ing cfe-dev because this may require some
              interpretation of language law]<br>
              <br>
              My understanding is that the compiler has the freedom to
              access extra data in C/C++ (not sure about other
              languages); AFAIK, the LLVM LangRef is silent about this.
              In C/C++, this is based on the "as-if rule":<br>
              <a href="http://en.cppreference.com/w/cpp/language/as_if" target="_blank">http://en.cppreference.com/w/cpp/language/as_if</a><br>
            </div>
          </div>
          <br>
        </div>
        So the question is: where should the optimizer draw the line
        with respect to perf vs. security if it involves operating on
        unknown data? Are there guidelines that we can use to decide
        this?<br>
        <br>
        <div>
          <div>The masked load transform referenced below is not unique
            in accessing / operating on unknown data. In addition to the
            related scalar loads -> vector load transform that I've
            mentioned earlier in this thread, see for example:<br>
            <a href="https://llvm.org/bugs/show_bug.cgi?id=20358" target="_blank">https://llvm.org/bugs/show_bug.cgi?id=20358</a><br>
            <div>(and the security paper and patch review linked there)<br>
            </div>
            <br>
          </div>
        </div>
        <div class="gmail_extra"><br>
          <div class="gmail_quote">On Mon, Mar 14, 2016 at 10:26 PM,
            Shahid, Asghar-ahmad <span dir="ltr"><<a href="mailto:Asghar-ahmad.Shahid@amd.com" target="_blank"></a><a href="mailto:Asghar-ahmad.Shahid@amd.com" target="_blank">Asghar-ahmad.Shahid@amd.com</a>></span>
            wrote:<br>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
              <div link="blue" vlink="purple" lang="EN-US">
                <div>
                  <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">Hi
                      Sanjay,</span></p>
                  <span>
                    <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"> </span></p>
                    <p class="MsoNormal" style="margin-bottom:12pt"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">>The
                        real question I have is whether it is legal to
                        read the extra memory, regardless of whether
                        this is a masked load or
                      </span></p>
                    <p class="MsoNormal" style="margin-bottom:12pt"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">>something
                        else.</span></p>
                  </span>
                  <p class="MsoNormal" style="margin-bottom:12pt"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">No,
                      It is not legal AFAIK because by doing that we are
                      exposing the content of the memory which
                      programmer</span></p>
                  <p class="MsoNormal" style="margin-bottom:12pt"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">does
                      not intend to. This may be vulnerable for
                      exploitation.</span></p>
                  <p class="MsoNormal" style="margin-bottom:12pt"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"> </span></p>
                  <p class="MsoNormal" style="margin-bottom:12pt"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">Regards,</span></p>
                  <p class="MsoNormal" style="margin-bottom:12pt"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">Shahid</span></p>
                  <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"> </span></p>
                  <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"> </span></p>
                  <div style="border-width:medium medium medium 1.5pt;border-style:none none none solid;border-color:-moz-use-text-color -moz-use-text-color -moz-use-text-color blue;padding:0in 0in 0in 4pt">
                    <div>
                      <div style="border-width:1pt medium medium;border-style:solid none none;border-color:rgb(181,196,223) -moz-use-text-color -moz-use-text-color;padding:3pt 0in 0in">
                        <p class="MsoNormal"><b><span style="font-size:10pt;font-family:"Tahoma","sans-serif"">From:</span></b><span style="font-size:10pt;font-family:"Tahoma","sans-serif"">
                            llvm-dev [mailto:<a href="mailto:llvm-dev-bounces@lists.llvm.org" target="_blank">llvm-dev-bounces@lists.llvm.org</a>]
                            <b>On Behalf Of </b>Sanjay Patel via
                            llvm-dev<br>
                            <b>Sent:</b> Monday, March 14, 2016 10:37 PM<br>
                            <b>To:</b> Nema, Ashutosh<br>
                            <b>Cc:</b> llvm-dev<br>
                            <b>Subject:</b> Re: [llvm-dev] masked-load
                            endpoints optimization</span></p>
                      </div>
                    </div>
                    <div>
                      <div>
                        <p class="MsoNormal"> </p>
                        <div>
                          <p class="MsoNormal">I checked in a patch to
                            do this transform for x86-only for now:<br>
                            <a href="http://reviews.llvm.org/D18094" target="_blank">http://reviews.llvm.org/D18094</a>
                            /
                            <a href="http://reviews.llvm.org/rL263446" target="_blank">http://reviews.llvm.org/rL263446</a></p>
                        </div>
                        <div>
                          <p class="MsoNormal"> </p>
                          <div>
                            <p class="MsoNormal">On Fri, Mar 11, 2016 at
                              9:57 AM, Sanjay Patel <<a href="mailto:spatel@rotateright.com" target="_blank"></a><a href="mailto:spatel@rotateright.com" target="_blank">spatel@rotateright.com</a>>
                              wrote:</p>
                            <div>
                              <div>
                                <div>
                                  <div>
                                    <p class="MsoNormal" style="margin-bottom:12pt">Thanks,
                                      Ashutosh.</p>
                                  </div>
                                  <p class="MsoNormal" style="margin-bottom:12pt">Yes,
                                    either TTI or TLI could be used to
                                    limit the transform if we do it in
                                    CGP rather than the DAG.</p>
                                </div>
                                <p class="MsoNormal" style="margin-bottom:12pt">The real
                                  question I have is whether it is legal
                                  to read the extra memory, regardless
                                  of whether this is a masked load or
                                  something else.</p>
                              </div>
                              <p class="MsoNormal">Note that the x86
                                backend already does this, so either my
                                proposal is ok for x86, or we're already
                                doing an illegal optimization:</p>
                              <div>
                                <p class="MsoNormal" style="margin-bottom:12pt"><br>
                                  define <4 x i32>
                                  @load_bonus_bytes(i32* %addr1, <4 x
                                  i32> %v) {<br>
                                    %ld1 = load i32, i32* %addr1<br>
                                    %addr2 = getelementptr i32, i32*
                                  %addr1, i64 3<br>
                                    %ld2 = load i32, i32* %addr2<br>
                                    %vec1 = insertelement <4 x
                                  i32> undef, i32 %ld1, i32 0<br>
                                    %vec2 = insertelement <4 x
                                  i32> %vec1, i32 %ld2, i32 3<br>
                                    ret <4 x i32> %vec2<br>
                                  }<br>
                                  <br>
                                  $ ./llc -o - loadcombine.ll <br>
                                  ...<br>
                                      movups    (%rdi), %xmm0<br>
                                      retq<br>
                                  <br>
                                  <br>
                                </p>
                              </div>
                            </div>
                            <div>
                              <div>
                                <div>
                                  <p class="MsoNormal"> </p>
                                  <div>
                                    <p class="MsoNormal">On Thu, Mar 10,
                                      2016 at 10:22 PM, Nema, Ashutosh
                                      <<a href="mailto:Ashutosh.Nema@amd.com" target="_blank">Ashutosh.Nema@amd.com</a>>
                                      wrote:</p>
                                    <div>
                                      <div>
                                        <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">This
                                            looks interesting, the main
                                            motivation appears to be
                                            replacing masked vector load
                                            with a general vector load
                                            followed by a select.</span></p>
                                        <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"> </span></p>
                                        <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">Observed
                                            masked vector loads are in
                                            general expensive in
                                            comparison with a vector
                                            load.</span></p>
                                        <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"> </span></p>
                                        <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">But
                                            if first & last element
                                            of a masked vector load are
                                            guaranteed to be accessed
                                            then it can be transformed
                                            to a vector load.</span></p>
                                        <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"> </span></p>
                                        <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">In
                                            opt this can be driven by
                                            TTI, where the benefit of
                                            this transformation should
                                            be checked.</span></p>
                                        <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"> </span></p>
                                        <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">Regards,</span></p>
                                        <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">Ashutosh</span></p>
                                        <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"> </span></p>
                                        <p class="MsoNormal"><b><span style="font-size:11pt;font-family:"Calibri","sans-serif"">From:</span></b><span style="font-size:11pt;font-family:"Calibri","sans-serif"">
                                            llvm-dev [mailto:<a href="mailto:llvm-dev-bounces@lists.llvm.org" target="_blank"></a><a href="mailto:llvm-dev-bounces@lists.llvm.org" target="_blank">llvm-dev-bounces@lists.llvm.org</a>]
                                            <b>On Behalf Of </b>Sanjay
                                            Patel via llvm-dev<br>
                                            <b>Sent:</b> Friday, March
                                            11, 2016 3:37 AM<br>
                                            <b>To:</b> llvm-dev<br>
                                            <b>Subject:</b> [llvm-dev]
                                            masked-load endpoints
                                            optimization</span></p>
                                        <div>
                                          <div>
                                            <p class="MsoNormal"> </p>
                                            <div>
                                              <div>
                                                <div>
                                                  <p class="MsoNormal" style="margin-bottom:12pt">If
                                                    we're loading the
                                                    first and last
                                                    elements of a vector
                                                    using a masked load
                                                    [1], can we replace
                                                    the masked load with
                                                    a full vector load?<br>
                                                    <br>
                                                    "The result of this
                                                    operation is
                                                    equivalent to a
                                                    regular vector load
                                                    instruction followed
                                                    by a ‘select’
                                                    between the loaded
                                                    and the passthru
                                                    values, predicated
                                                    on the same mask.
                                                    However, using this
                                                    intrinsic prevents
                                                    exceptions on memory
                                                    access to masked-off
                                                    lanes."<br>
                                                    <br>
                                                    I think the fact
                                                    that we're loading
                                                    the endpoints of the
                                                    vector guarantees
                                                    that a full vector
                                                    load can't have any
                                                    different
                                                    faulting/exception
                                                    behavior on x86 and
                                                    most (?) other
                                                    targets. We would,
                                                    however, be reading
                                                    memory that the
                                                    program has not
                                                    explicitly
                                                    requested.</p>
                                                </div>
                                                <p class="MsoNormal">IR
                                                  example:<br>
                                                  <br>
                                                  define <4 x i32>
                                                  @maskedload_endpoints(<4
                                                  x i32>* %addr,
                                                  <4 x i32> %v) {</p>
                                              </div>
                                              <p class="MsoNormal">  ;
                                                load the first and last
                                                elements pointed to by
                                                %addr and shuffle those
                                                into %v</p>
                                              <div>
                                                <p class="MsoNormal" style="margin-bottom:12pt"> 
                                                  %res = call <4 x
                                                  i32>
                                                  @llvm.masked.load.v4i32(<4
                                                  x i32>* %addr, i32
                                                  4, <4 x i1>
                                                  <i1 1, i1 0, i1 0,
                                                  i1 1>, <4 x
                                                  i32> %v)<br>
                                                    ret <4 x i32>
                                                  %res<br>
                                                  }</p>
                                              </div>
                                              <div>
                                                <p class="MsoNormal">would
                                                  become something like:</p>
                                              </div>
                                              <div>
                                                <p class="MsoNormal"><br>
                                                  define <4 x i32>
                                                  @maskedload_endpoints(<4
                                                  x i32>* %addr,
                                                  <4 x i32> %v) {</p>
                                              </div>
                                              <div>
                                                <p class="MsoNormal"> 
                                                  %vecload = load <4
                                                  x i32>, <4 x
                                                  i32>* %addr, align
                                                  4</p>
                                              </div>
                                              <div>
                                                <p class="MsoNormal"> 
                                                  %sel = select <4 x
                                                  i1> <i1 1, i1 0,
                                                  i1 0, i1 1>, <4
                                                  x i32> %vecload,
                                                  <4 x i32> %v</p>
                                              </div>
                                              <div>
                                                <p class="MsoNormal" style="margin-bottom:12pt"> 
                                                  ret <4 x i32>
                                                  %sel<br>
                                                  }</p>
                                              </div>
                                              <div>
                                                <p class="MsoNormal">If
                                                  this isn't valid as an
                                                  IR optimization, would
                                                  it be acceptable as a
                                                  DAG combine with
                                                  target hook to opt in?</p>
                                              </div>
                                              <div>
                                                <div>
                                                  <div>
                                                    <p class="MsoNormal"><br>
                                                      [1] <a href="http://llvm.org/docs/LangRef.html#llvm-masked-load-intrinsics" target="_blank">
</a><a href="http://llvm.org/docs/LangRef.html#llvm-masked-load-intrinsics" target="_blank">http://llvm.org/docs/LangRef.html#llvm-masked-load-intrinsics</a></p>
                                                  </div>
                                                </div>
                                              </div>
                                            </div>
                                          </div>
                                        </div>
                                      </div>
                                    </div>
                                  </div>
                                  <p class="MsoNormal"> </p>
                                </div>
                              </div>
                            </div>
                          </div>
                          <p class="MsoNormal"> </p>
                        </div>
                      </div>
                    </div>
                  </div>
                </div>
              </div>
            </blockquote>
          </div>
          <br>
        </div>
      </div>
      <br>
      <fieldset></fieldset>
      <br>
      </div></div><pre>_______________________________________________
LLVM Developers mailing list
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><span class="HOEnZb"><font color="#888888">
</font></span></pre><span class="HOEnZb"><font color="#888888">
    </font></span></blockquote><span class="HOEnZb"><font color="#888888">
    <br>
    <pre cols="72">-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
</pre>
  </font></span></div>

</blockquote></div><br></div>