[llvm-dev] [RFC] carry-less multiplication instruction

Sun Jul 5 07:54:48 PDT 2020

It'd be useful in your proposal to to note which of the existing llvm
target specific intrinsics this generic intrinsic can effectively
supersede. (E.g. llvm.x86.pclmulqdq for x86.)

When we are already supporting a given function via target specific
intrinsics for a number of different targets, that seems a pretty good
argument for making it available as a more generic target independent
intrinsic.

On Sun, Jul 5, 2020, 5:18 AM Shawn Landden via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

>
>
> Carry-less multiplication[1] instructions exist (at least optionally) on
> many architectures: armv8, RISC-V, x86_64, POWER, SPARC, C64x, and possibly
> more.
>
> This proposal is to add a llvm.clmul instruction. Or if that is
> contentious, llvm.experimental.bitmanip.clmul instruction. It takes two
> integer operands of the same width, and returns an integer with twice the
> width of the operands. (Is there a good reason to make these the same
> width, as all the other operations do even when it doesn’t really make
> sense for the mathematical operation–like multiplication or
> ctpop/ctlz/cttz?)
>
> If the CPU does not have a dedication clmul operation, it can be lowered
> to regular multiplication, by using holes to avoid carrys.
>
> ==Where is clmul used?==
>
> While somewhat specialized, the RISC-V manual documents many uses: [2]
>
> The classic applications forclmulare Cyclic Redundancy Check (CRC) [11, 26]
>
> and Galois/CounterMode (GCM), but more applications exist, including the
> following examples.There are obvious applications in hashing and pseudo
> random number generations. For exam-ple, it has been reported that hashes
> based on carry-less multiplications can outperform Google’sCityHash [17].
>
> clmulof a number with itself inserts zeroes between each input bit. This
> can be useful for generatingMorton code [23].
>
> clmulof a number with -1 calculates the prefix XOR operation. This can be
> useful for decodinggray codes.Another application of XOR prefix sums
> calculated withclmulis branchless tracking of quotedstrings in
> high-performance parsers. [16]
>
> Carry-less multiply can also be used to implement Erasure code
> efficiently. [14]
>
> ==clmul lowering without hardware support==
> A 8x8=>16 clmul can also be lowered to a 32x32=>64 multiplication when
> there is no specialized instruction (also 15x15=>30, to a 60x60=>120, or if
> bitreverse is available 16x16=>32 to TWO 64x64=>64 multiplications)[3].
>
> [1] https://en.wikipedia.org/wiki/Carry-less_product
> [2] (page 30)
> <https://en.wikipedia.org/wiki/Carry-less_product%5B2%5D%20(page%2030)%20https://raw.githubusercontent.com/riscv/riscv-bitmanip/master/bitmanip-0.92.pdf%5B3%5D%20https://www.bearssl.org/constanttime.html>
> https://raw.githubusercontent.com/riscv/riscv-bitmanip/master/bitmanip-0.92.pdf
> [3]
> <https://en.wikipedia.org/wiki/Carry-less_product%5B2%5D%20(page%2030)%20https://raw.githubusercontent.com/riscv/riscv-bitmanip/master/bitmanip-0.92.pdf%5B3%5D%20https://www.bearssl.org/constanttime.html>
> https://www.bearssl.org/constanttime.html
>
>
>
> (First posted to discord
> --
> Shawn Landden
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200705/a56891ea/attachment.html>