<div dir="ltr"><div><div><div>[cc'ing cfe-dev because this may require some interpretation of language law]<br><br>My understanding is that the compiler has the freedom to access extra data in C/C++ (not sure about other languages); AFAIK, the LLVM LangRef is silent about this. In C/C++, this is based on the "as-if rule":<br><a href="http://en.cppreference.com/w/cpp/language/as_if">http://en.cppreference.com/w/cpp/language/as_if</a><br></div></div><br></div>So the question is: where should the optimizer draw the line with respect to perf vs. security if it involves operating on unknown data? Are there guidelines that we can use to decide this?<br><br><div><div>The masked load transform referenced below is not unique in
accessing / operating on unknown data. In addition to the related scalar loads -> vector load transform that I've mentioned earlier in
this thread, see for example:<br><a href="https://llvm.org/bugs/show_bug.cgi?id=20358">https://llvm.org/bugs/show_bug.cgi?id=20358</a><br><div>(and the security paper and patch review linked there)<br></div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Mar 14, 2016 at 10:26 PM, Shahid, Asghar-ahmad <span dir="ltr"><<a href="mailto:Asghar-ahmad.Shahid@amd.com" target="_blank">Asghar-ahmad.Shahid@amd.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div link="blue" vlink="purple" lang="EN-US">
<div>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">Hi Sanjay,<u></u><u></u></span></p><span class="">
<p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"><u></u> <u></u></span></p>
<p class="MsoNormal" style="margin-bottom:12pt"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">>The real question I have is whether it is legal to read the extra memory, regardless of whether this is a masked load or
<u></u><u></u></span></p>
<p class="MsoNormal" style="margin-bottom:12pt"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">>something else.<u></u><u></u></span></p>
</span><p class="MsoNormal" style="margin-bottom:12pt"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">No, It is not legal AFAIK because by doing that we are exposing the content of the memory which programmer<u></u><u></u></span></p>
<p class="MsoNormal" style="margin-bottom:12pt"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">does not intend to. This may be vulnerable for exploitation.<u></u><u></u></span></p>
<p class="MsoNormal" style="margin-bottom:12pt"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"><u></u> <u></u></span></p>
<p class="MsoNormal" style="margin-bottom:12pt"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">Regards,<u></u><u></u></span></p>
<p class="MsoNormal" style="margin-bottom:12pt"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">Shahid<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"><u></u> <u></u></span></p>
<div style="border-width:medium medium medium 1.5pt;border-style:none none none solid;border-color:-moz-use-text-color -moz-use-text-color -moz-use-text-color blue;padding:0in 0in 0in 4pt">
<div>
<div style="border-width:1pt medium medium;border-style:solid none none;border-color:rgb(181,196,223) -moz-use-text-color -moz-use-text-color;padding:3pt 0in 0in">
<p class="MsoNormal"><b><span style="font-size:10pt;font-family:"Tahoma","sans-serif"">From:</span></b><span style="font-size:10pt;font-family:"Tahoma","sans-serif""> llvm-dev [mailto:<a href="mailto:llvm-dev-bounces@lists.llvm.org" target="_blank">llvm-dev-bounces@lists.llvm.org</a>]
<b>On Behalf Of </b>Sanjay Patel via llvm-dev<br>
<b>Sent:</b> Monday, March 14, 2016 10:37 PM<br>
<b>To:</b> Nema, Ashutosh<br>
<b>Cc:</b> llvm-dev<br>
<b>Subject:</b> Re: [llvm-dev] masked-load endpoints optimization<u></u><u></u></span></p>
</div>
</div><div><div class="h5">
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<p class="MsoNormal">I checked in a patch to do this transform for x86-only for now:<br>
<a href="http://reviews.llvm.org/D18094" target="_blank">http://reviews.llvm.org/D18094</a> /
<a href="http://reviews.llvm.org/rL263446" target="_blank">http://reviews.llvm.org/rL263446</a><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<p class="MsoNormal">On Fri, Mar 11, 2016 at 9:57 AM, Sanjay Patel <<a href="mailto:spatel@rotateright.com" target="_blank">spatel@rotateright.com</a>> wrote:<u></u><u></u></p>
<div>
<div>
<div>
<div>
<p class="MsoNormal" style="margin-bottom:12pt">Thanks, Ashutosh.<u></u><u></u></p>
</div>
<p class="MsoNormal" style="margin-bottom:12pt">Yes, either TTI or TLI could be used to limit the transform if we do it in CGP rather than the DAG.<u></u><u></u></p>
</div>
<p class="MsoNormal" style="margin-bottom:12pt">The real question I have is whether it is legal to read the extra memory, regardless of whether this is a masked load or something else.<u></u><u></u></p>
</div>
<p class="MsoNormal">Note that the x86 backend already does this, so either my proposal is ok for x86, or we're already doing an illegal optimization:<u></u><u></u></p>
<div>
<p class="MsoNormal" style="margin-bottom:12pt"><br>
define <4 x i32> @load_bonus_bytes(i32* %addr1, <4 x i32> %v) {<br>
%ld1 = load i32, i32* %addr1<br>
%addr2 = getelementptr i32, i32* %addr1, i64 3<br>
%ld2 = load i32, i32* %addr2<br>
%vec1 = insertelement <4 x i32> undef, i32 %ld1, i32 0<br>
%vec2 = insertelement <4 x i32> %vec1, i32 %ld2, i32 3<br>
ret <4 x i32> %vec2<br>
}<br>
<br>
$ ./llc -o - loadcombine.ll <br>
...<br>
movups (%rdi), %xmm0<br>
retq<br>
<br>
<br>
<u></u><u></u></p>
</div>
</div>
<div>
<div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<p class="MsoNormal">On Thu, Mar 10, 2016 at 10:22 PM, Nema, Ashutosh <<a href="mailto:Ashutosh.Nema@amd.com" target="_blank">Ashutosh.Nema@amd.com</a>> wrote:<u></u><u></u></p>
<div>
<div>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">This looks interesting, the main motivation appears to be replacing masked vector load with a general
vector load followed by a select.</span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"> </span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">Observed masked vector loads are in general expensive in comparison with a vector load.</span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"> </span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">But if first & last element of a masked vector load are guaranteed to be accessed then it can be
transformed to a vector load.</span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"> </span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">In opt this can be driven by TTI, where the benefit of this transformation should be checked.</span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"> </span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">Regards,</span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)">Ashutosh</span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri","sans-serif";color:rgb(31,73,125)"> </span><u></u><u></u></p>
<p class="MsoNormal"><b><span style="font-size:11pt;font-family:"Calibri","sans-serif"">From:</span></b><span style="font-size:11pt;font-family:"Calibri","sans-serif""> llvm-dev [mailto:<a href="mailto:llvm-dev-bounces@lists.llvm.org" target="_blank">llvm-dev-bounces@lists.llvm.org</a>]
<b>On Behalf Of </b>Sanjay Patel via llvm-dev<br>
<b>Sent:</b> Friday, March 11, 2016 3:37 AM<br>
<b>To:</b> llvm-dev<br>
<b>Subject:</b> [llvm-dev] masked-load endpoints optimization</span><u></u><u></u></p>
<div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
<div>
<div>
<div>
<p class="MsoNormal" style="margin-bottom:12pt">If we're loading the first and last elements of a vector using a masked load [1], can we replace the masked load with a full vector load?<br>
<br>
"The result of this operation is equivalent to a regular vector load instruction followed by a ‘select’ between the loaded and the passthru values, predicated on the same mask. However, using this intrinsic prevents exceptions on memory access to masked-off
lanes."<br>
<br>
I think the fact that we're loading the endpoints of the vector guarantees that a full vector load can't have any different faulting/exception behavior on x86 and most (?) other targets. We would, however, be reading memory that the program has not explicitly
requested.<u></u><u></u></p>
</div>
<p class="MsoNormal">IR example:<br>
<br>
define <4 x i32> @maskedload_endpoints(<4 x i32>* %addr, <4 x i32> %v) {<u></u><u></u></p>
</div>
<p class="MsoNormal"> ; load the first and last elements pointed to by %addr and shuffle those into %v<u></u><u></u></p>
<div>
<p class="MsoNormal" style="margin-bottom:12pt"> %res = call <4 x i32> @llvm.masked.load.v4i32(<4 x i32>* %addr, i32 4, <4 x i1> <i1 1, i1 0, i1 0, i1 1>, <4 x i32> %v)<br>
ret <4 x i32> %res<br>
}<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">would become something like:<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><br>
define <4 x i32> @maskedload_endpoints(<4 x i32>* %addr, <4 x i32> %v) {<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> %vecload = load <4 x i32>, <4 x i32>* %addr, align 4<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> %sel = select <4 x i1> <i1 1, i1 0, i1 0, i1 1>, <4 x i32> %vecload, <4 x i32> %v<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal" style="margin-bottom:12pt"> ret <4 x i32> %sel<br>
}<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">If this isn't valid as an IR optimization, would it be acceptable as a DAG combine with target hook to opt in?<u></u><u></u></p>
</div>
<div>
<div>
<div>
<p class="MsoNormal"><br>
[1] <a href="http://llvm.org/docs/LangRef.html#llvm-masked-load-intrinsics" target="_blank">
http://llvm.org/docs/LangRef.html#llvm-masked-load-intrinsics</a><u></u><u></u></p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
</div>
</div>
</div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
</div></div></div>
</div>
</div>
</blockquote></div><br></div></div>