<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Consolas;
panose-1:2 11 6 9 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
span.EmailStyle18
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri",sans-serif;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal">I didn’t have any specific plans. What happens next depends on whether people find this code useful. It’s kind of complicated because since the time it was written the “canonical form” of LLVM IR has changed multiple times, and I added
more and more stuff to “recanonicalize” it back into the form the idiom recognition code is used to seeing.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">I was thinking about inventing a different way of recognizing the idiom, but I haven’t had time to spend on it. Whatever it is, it should be immune to ongoing changes in instcombine.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:Consolas">-- </span>
<span style="font-size:9.0pt;font-family:Consolas"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:8.0pt;font-family:Consolas">Krzysztof Parzyszek
<a href="mailto:kparzysz@quicinc.com"><span style="color:#0563C1">kparzysz@quicinc.com</span></a> AI tools development<o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div style="border:none;border-left:solid blue 1.5pt;padding:0in 0in 0in 4.0pt">
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b>From:</b> Craig Topper <craig.topper@gmail.com> <br>
<b>Sent:</b> Thursday, July 9, 2020 2:16 PM<br>
<b>To:</b> Krzysztof Parzyszek <kparzysz@quicinc.com><br>
<b>Cc:</b> llvm-dev@lists.llvm.org<br>
<b>Subject:</b> [EXT] Re: [llvm-dev] [RFC] carry-less multiplication instruction<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal">I think i'd prefer to have the output type match the input type. Makes it more similar to other intrinsics and binary operators. We should be able to zero extend the inputs if you want something like 64x64->128.<o:p></o:p></p>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Are you planning to expose this to C through clang? What types would we expose?<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">~Craig<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div>
<p class="MsoNormal">On Thu, Jul 9, 2020 at 10:28 AM Krzysztof Parzyszek via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>> wrote:<o:p></o:p></p>
</div>
<blockquote style="border:none;border-left:solid #CCCCCC 1.0pt;padding:0in 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in">
<p class="MsoNormal">FWIW, Hexagon has a pass to recognize polynomial multiplication:<br>
llvm/lib/Target/Hexagon/HexagonLoopIdiomRecognition.cpp<br>
See "PolynomialMultiplyRecognize"<br>
<br>
--<br>
Krzysztof Parzyszek <a href="mailto:kparzysz@quicinc.com" target="_blank">kparzysz@quicinc.com</a> AI tools development<br>
<br>
> -----Original Message-----<br>
> From: llvm-dev <<a href="mailto:llvm-dev-bounces@lists.llvm.org" target="_blank">llvm-dev-bounces@lists.llvm.org</a>> On Behalf Of Shawn Landden<br>
> via llvm-dev<br>
> Sent: Thursday, July 9, 2020 11:40 AM<br>
> To: Roman Lebedev <<a href="mailto:lebedev.ri@gmail.com" target="_blank">lebedev.ri@gmail.com</a>><br>
> Cc: <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
> Subject: [EXT] Re: [llvm-dev] [RFC] carry-less multiplication instruction<br>
><br>
><br>
><br>
> 05.07.2020, 05:22, "Roman Lebedev" <<a href="mailto:lebedev.ri@gmail.com" target="_blank">lebedev.ri@gmail.com</a>>:<br>
> > On Sun, Jul 5, 2020 at 12:18 PM Shawn Landden via llvm-dev<br>
> > <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> wrote:<br>
> >> Carry-less multiplication[1] instructions exist (at least optionally) on<br>
> many architectures: armv8, RISC-V, x86_64, POWER, SPARC, C64x, and possibly<br>
> more.<br>
> >><br>
> >> This proposal is to add a llvm.clmul instruction. Or if that is<br>
> >> contentious, llvm.experimental.bitmanip.clmul instruction. It takes<br>
> >> two integer operands of the same width, and returns an integer with<br>
> >> twice the width of the operands. (Is there a good reason to make<br>
> >> these the same width, as all the other operations do even when it<br>
> >> doesn’t really make sense for the mathematical operation–like<br>
> >> multiplication or ctpop/ctlz/cttz?)<br>
> >><br>
> >> If the CPU does not have a dedication clmul operation, it can be lowered<br>
> to regular multiplication, by using holes to avoid carrys.<br>
> >><br>
> >> ==Where is clmul used?==<br>
> >><br>
> >> While somewhat specialized, the RISC-V manual documents many uses:<br>
> >> [2]<br>
> >><br>
> >> The classic applications forclmulare Cyclic Redundancy Check (CRC)<br>
> >> [11, 26]<br>
> >><br>
> >> and Galois/CounterMode (GCM), but more applications exist, including the<br>
> following examples.There are obvious applications in hashing and pseudo<br>
> random number generations. For exam-ple, it has been reported that hashes<br>
> based on carry-less multiplications can outperform Google’sCityHash [17].<br>
> >><br>
> >> clmulof a number with itself inserts zeroes between each input bit. This<br>
> can be useful for generatingMorton code [23].<br>
> >><br>
> >> clmulof a number with -1 calculates the prefix XOR operation. This<br>
> >> can be useful for decodinggray codes.Another application of XOR<br>
> >> prefix sums calculated withclmulis branchless tracking of<br>
> >> quotedstrings in high-performance parsers. [16]<br>
> >><br>
> >> Carry-less multiply can also be used to implement Erasure code<br>
> >> efficiently. [14]<br>
> >><br>
> >> ==clmul lowering without hardware support==<br>
> >> A 8x8=>16 clmul can also be lowered to a 32x32=>64 multiplication when<br>
> there is no specialized instruction (also 15x15=>30, to a 60x60=>120, or if<br>
> bitreverse is available 16x16=>32 to TWO 64x64=>64 multiplications)[3].<br>
> >><br>
> >> [1] <a href="https://en.wikipedia.org/wiki/Carry-less_product" target="_blank">
https://en.wikipedia.org/wiki/Carry-less_product</a><br>
> >> [2] (page 30)<br>
> >> <a href="https://raw.githubusercontent.com/riscv/riscv-bitmanip/master/bitmani" target="_blank">
https://raw.githubusercontent.com/riscv/riscv-bitmanip/master/bitmani</a><br>
> >> p-0.92.pdf<br>
> >> [3] <a href="https://www.bearssl.org/constanttime.html" target="_blank">https://www.bearssl.org/constanttime.html</a><br>
> ><br>
> > What benefit would this intrinsic would bring to the middle-end IR,<br>
> > over it's current naive expanded form?<br>
> ><br>
> > Note that teaching backends to produce it, or even adding it to<br>
> > backend (ISD opcodes) and matching it in DAGCombiner has much lower<br>
> > barrier of entry, i would suggest to start there.<br>
> It cannot be matched.<br>
> ><br>
> >> (First posted to discord<br>
> >><br>
> >> --<br>
> >> Shawn Landden<br>
> ><br>
> > Roman<br>
> ><br>
> >> _______________________________________________<br>
> >> LLVM Developers mailing list<br>
> >> <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
> >> <a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank">
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
><br>
> --<br>
> Shawn Landden<br>
><br>
> _______________________________________________<br>
> LLVM Developers mailing list<br>
> <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
> <a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank">
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
_______________________________________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><o:p></o:p></p>
</blockquote>
</div>
</div>
</div>
</body>
</html>