<div dir="ltr">Shawn,<div><br></div><div>Are you able to summarize the different instructions from the various targets. It looks like there different implementation choices made for each target. For example, X86 takes two v2i64 inputs and picks either an even or odd element from each to multiply to produce a v1i128 result. It looks like RISC-V has instructions to produce either the high half of the result or the low half of the result. Those are the only two I checked.</div><div><br></div><div>Will a common intrinsic need custom handling for each target or is there a common version that multiple targets use that we should choose for the intrinsic?</div><div><br></div><div><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature">~Craig</div></div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, Jul 5, 2020 at 7:55 AM James Y Knight via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="auto"><div>It'd be useful in your proposal to to note which of the existing llvm target specific intrinsics this generic intrinsic can effectively supersede. (E.g. llvm.x86.pclmulqdq for x86.)</div><div dir="auto"><br></div><div dir="auto">When we are already supporting a given function via target specific intrinsics for a number of different targets, that seems a pretty good argument for making it available as a more generic target independent intrinsic.</div><div dir="auto"><br><div class="gmail_quote" dir="auto"><div dir="ltr" class="gmail_attr">On Sun, Jul 5, 2020, 5:18 AM Shawn Landden via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" rel="noreferrer noreferrer" target="_blank">llvm-dev@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div> </div><div><div><p>Carry-less multiplication[1] instructions exist (at least optionally) on many architectures: armv8, RISC-V, x86_64, POWER, SPARC, C64x, and possibly more.</p><p>This proposal is to add a <code>llvm.clmul</code> instruction. Or if that is contentious, <code>llvm.experimental.bitmanip.clmul</code> instruction. It takes two integer operands of the same width, and returns an integer with twice the width of the operands. (Is there a good reason to make these the same width, as all the other operations do even when it doesn’t really make sense for the mathematical operation–like multiplication or ctpop/ctlz/cttz?)</p><p>If the CPU does not have a dedication clmul operation, it can be lowered to regular multiplication, by using holes to avoid carrys.</p><p>==Where is clmul used?==</p><p>While somewhat specialized, the RISC-V manual documents many uses: [2]</p><p>The classic applications forclmulare Cyclic Redundancy Check (CRC) [11, 26]</p><p>and Galois/CounterMode (GCM), but more applications exist, including the following examples.There are obvious applications in hashing and pseudo random number generations. For exam-ple, it has been reported that hashes based on carry-less multiplications can outperform Google’sCityHash [17].</p><p>clmulof a number with itself inserts zeroes between each input bit. This can be useful for generatingMorton code [23].</p><p>clmulof a number with -1 calculates the prefix XOR operation. This can be useful for decodinggray codes.Another application of XOR prefix sums calculated withclmulis branchless tracking of quotedstrings in high-performance parsers. [16]</p><p>Carry-less multiply can also be used to implement Erasure code efficiently. [14]</p><p>==clmul lowering without hardware support==<br>A 8x8=>16 clmul can also be lowered to a 32x32=>64 multiplication when there is no specialized instruction (also 15x15=>30, to a 60x60=>120, or if bitreverse is available 16x16=>32 to TWO 64x64=>64 multiplications)[3].</p><p>[1] <a href="https://en.wikipedia.org/wiki/Carry-less_product" rel="noreferrer noreferrer noreferrer" target="_blank">https://en.wikipedia.org/wiki/Carry-less_product</a><br><a href="https://en.wikipedia.org/wiki/Carry-less_product%5B2%5D%20(page%2030)%20https://raw.githubusercontent.com/riscv/riscv-bitmanip/master/bitmanip-0.92.pdf%5B3%5D%20https://www.bearssl.org/constanttime.html" rel="noreferrer noreferrer noreferrer" target="_blank">[2] (page 30) </a><a href="https://raw.githubusercontent.com/riscv/riscv-bitmanip/master/bitmanip-0.92.pdf" rel="noreferrer noreferrer noreferrer" target="_blank">https://raw.githubusercontent.com/riscv/riscv-bitmanip/master/bitmanip-0.92.pdf</a><br><a href="https://en.wikipedia.org/wiki/Carry-less_product%5B2%5D%20(page%2030)%20https://raw.githubusercontent.com/riscv/riscv-bitmanip/master/bitmanip-0.92.pdf%5B3%5D%20https://www.bearssl.org/constanttime.html" rel="noreferrer noreferrer noreferrer" target="_blank">[3] </a><a href="https://www.bearssl.org/constanttime.html" rel="noreferrer noreferrer noreferrer" target="_blank">https://www.bearssl.org/constanttime.html</a></p><p> </p><p>(First posted to discord</p></div></div><div>-- </div><div>Shawn Landden</div><div> </div>_______________________________________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org" rel="noreferrer noreferrer noreferrer" target="_blank">llvm-dev@lists.llvm.org</a><br>
<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer noreferrer noreferrer noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
</blockquote></div>
</div></div>
_______________________________________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
</blockquote></div>