<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman",serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
span.EmailStyle17
{mso-style-type:personal;
font-family:"Calibri",sans-serif;
color:#1F497D;}
span.EmailStyle18
{mso-style-type:personal-compose;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;
font-family:"Calibri",sans-serif;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:72.0pt 90.0pt 72.0pt 90.0pt;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal"><span dir="RTL"></span><span lang="HE" dir="RTL" style="font-size:11.0pt;font-family:"Arial",sans-serif;color:#1F497D"><span dir="RTL"></span><
</span><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">The real question I have is whether it is legal to read the extra memory, regardless of whether this is a masked load or something else.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">If one is allowed to read from a given address, a reasonable(?) assumption is that the aligned cache-line containing this address can be read. This should help
answer the question.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">Such an assumption may be interesting to consider also in isSafeToLoadUnconditionally btw.</span><span lang="HE" dir="RTL" style="font-size:11.0pt;font-family:"Arial",sans-serif;color:#1F497D"><o:p></o:p></span></p>
<p class="MsoNormal"><a name="_MailEndCompose"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></a></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">Wonder in what situations one may know (at compile time) that both the first and the last bit of a mask are on?<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">Thanks for rL263446,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">Ayal.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<div style="border:none;border-left:solid blue 1.5pt;padding:0cm 0cm 0cm 4.0pt">
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><a name="_____replyseparator"></a><b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> llvm-dev [<a href="mailto:llvm-dev-bounces@lists.llvm.org">mailto:llvm-dev-bounces@lists.llvm.org</a>]
<b>On Behalf Of </b>Sanjay Patel via llvm-dev<br>
<b>Sent:</b> Friday, March 11, 2016 18:57<br>
<b>To:</b> Nema, Ashutosh <<a href="mailto:Ashutosh.Nema@amd.com">Ashutosh.Nema@amd.com</a>><br>
<b>Cc:</b> llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>><br>
<b>Subject:</b> Re: [llvm-dev] masked-load endpoints optimization<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div>
<div>
<div>
<p class="MsoNormal" style="margin-bottom:12.0pt">Thanks, Ashutosh.<o:p></o:p></p>
</div>
<p class="MsoNormal" style="margin-bottom:12.0pt">Yes, either TTI or TLI could be used to limit the transform if we do it in CGP rather than the DAG.<o:p></o:p></p>
</div>
<p class="MsoNormal" style="margin-bottom:12.0pt">The real question I have is whether it is legal to read the extra memory, regardless of whether this is a masked load or something else.<o:p></o:p></p>
</div>
<p class="MsoNormal">Note that the x86 backend already does this, so either my proposal is ok for x86, or we're already doing an illegal optimization:<o:p></o:p></p>
<div>
<p class="MsoNormal" style="margin-bottom:12.0pt"><br>
define <4 x i32> @load_bonus_bytes(i32* %addr1, <4 x i32> %v) {<br>
%ld1 = load i32, i32* %addr1<br>
%addr2 = getelementptr i32, i32* %addr1, i64 3<br>
%ld2 = load i32, i32* %addr2<br>
%vec1 = insertelement <4 x i32> undef, i32 %ld1, i32 0<br>
%vec2 = insertelement <4 x i32> %vec1, i32 %ld2, i32 3<br>
ret <4 x i32> %vec2<br>
}<br>
<br>
$ ./llc -o - loadcombine.ll <br>
...<br>
movups (%rdi), %xmm0<br>
retq<br>
<br>
<br>
<o:p></o:p></p>
</div>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal">On Thu, Mar 10, 2016 at 10:22 PM, Nema, Ashutosh <<a href="mailto:Ashutosh.Nema@amd.com" target="_blank">Ashutosh.Nema@amd.com</a>> wrote:<o:p></o:p></p>
<blockquote style="border:none;border-left:solid #CCCCCC 1.0pt;padding:0cm 0cm 0cm 6.0pt;margin-left:4.8pt;margin-top:5.0pt;margin-right:0cm;margin-bottom:5.0pt">
<div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">This looks interesting, the main motivation appears to be replacing masked vector load with a general
vector load followed by a select.</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">Observed masked vector loads are in general expensive in comparison with a vector load.</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">But if first & last element of a masked vector load are guaranteed to be accessed then it can be transformed
to a vector load.</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">In opt this can be driven by TTI, where the benefit of this transformation should be checked.</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">Regards,</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">Ashutosh</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> llvm-dev [mailto:</span><a href="mailto:llvm-dev-bounces@lists.llvm.org" target="_blank"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">llvm-dev-bounces@lists.llvm.org</span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">]
<b>On Behalf Of </b>Sanjay Patel via llvm-dev<br>
<b>Sent:</b> Friday, March 11, 2016 3:37 AM<br>
<b>To:</b> llvm-dev<br>
<b>Subject:</b> [llvm-dev] masked-load endpoints optimization</span><o:p></o:p></p>
<div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
<div>
<div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;margin-bottom:12.0pt">If we're loading the first and last elements of a vector using a masked load [1], can we replace the masked load with a full vector load?<br>
<br>
"The result of this operation is equivalent to a regular vector load instruction followed by a ‘select’ between the loaded and the passthru values, predicated on the same mask. However, using this intrinsic prevents exceptions on memory access to masked-off
lanes."<br>
<br>
I think the fact that we're loading the endpoints of the vector guarantees that a full vector load can't have any different faulting/exception behavior on x86 and most (?) other targets. We would, however, be reading memory that the program has not explicitly
requested.<o:p></o:p></p>
</div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">IR example:<br>
<br>
define <4 x i32> @maskedload_endpoints(<4 x i32>* %addr, <4 x i32> %v) {<o:p></o:p></p>
</div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> ; load the first and last elements pointed to by %addr and shuffle those into %v<o:p></o:p></p>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;margin-bottom:12.0pt"> %res = call <4 x i32> @llvm.masked.load.v4i32(<4 x i32>* %addr, i32 4, <4 x i1> <i1 1, i1 0, i1 0, i1 1>, <4 x i32> %v)<br>
ret <4 x i32> %res<br>
}<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">would become something like:<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><br>
define <4 x i32> @maskedload_endpoints(<4 x i32>* %addr, <4 x i32> %v) {<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> %vecload = load <4 x i32>, <4 x i32>* %addr, align 4<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> %sel = select <4 x i1> <i1 1, i1 0, i1 0, i1 1>, <4 x i32> %vecload, <4 x i32> %v<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;margin-bottom:12.0pt"> ret <4 x i32> %sel<br>
}<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">If this isn't valid as an IR optimization, would it be acceptable as a DAG combine with target hook to opt in?<o:p></o:p></p>
</div>
<div>
<div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><br>
[1] <a href="http://llvm.org/docs/LangRef.html#llvm-masked-load-intrinsics" target="_blank">
http://llvm.org/docs/LangRef.html#llvm-masked-load-intrinsics</a><o:p></o:p></p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</div>
</div>
<p>---------------------------------------------------------------------<br>
Intel Israel (74) Limited</p>
<p>This e-mail and any attachments may contain confidential material for<br>
the sole use of the intended recipient(s). Any review or distribution<br>
by others is strictly prohibited. If you are not the intended<br>
recipient, please contact the sender and delete all copies.</p></body>
</html>