[llvm-dev] [RFC] carry-less multiplication instruction

Thu Jul 9 10:27:55 PDT 2020

FWIW, Hexagon has a pass to recognize polynomial multiplication:
  llvm/lib/Target/Hexagon/HexagonLoopIdiomRecognition.cpp
See "PolynomialMultiplyRecognize"

--
Krzysztof Parzyszek  kparzysz at quicinc.com   AI tools development

> -----Original Message-----
> From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Shawn Landden
> via llvm-dev
> Sent: Thursday, July 9, 2020 11:40 AM
> To: Roman Lebedev <lebedev.ri at gmail.com>
> Cc: llvm-dev at lists.llvm.org
> Subject: [EXT] Re: [llvm-dev] [RFC] carry-less multiplication instruction
>
>
>
> 05.07.2020, 05:22, "Roman Lebedev" <lebedev.ri at gmail.com>:
> > On Sun, Jul 5, 2020 at 12:18 PM Shawn Landden via llvm-dev
> > <llvm-dev at lists.llvm.org> wrote:
> >>  Carry-less multiplication[1] instructions exist (at least optionally) on
> many architectures: armv8, RISC-V, x86_64, POWER, SPARC, C64x, and possibly
> more.
> >>
> >>  This proposal is to add a llvm.clmul instruction. Or if that is
> >> contentious, llvm.experimental.bitmanip.clmul instruction. It takes
> >> two integer operands of the same width, and returns an integer with
> >> twice the width of the operands. (Is there a good reason to make
> >> these the same width, as all the other operations do even when it
> >> doesn’t really make sense for the mathematical operation–like
> >> multiplication or ctpop/ctlz/cttz?)
> >>
> >>  If the CPU does not have a dedication clmul operation, it can be lowered
> to regular multiplication, by using holes to avoid carrys.
> >>
> >>  ==Where is clmul used?==
> >>
> >>  While somewhat specialized, the RISC-V manual documents many uses:
> >> [2]
> >>
> >>  The classic applications forclmulare Cyclic Redundancy Check (CRC)
> >> [11, 26]
> >>
> >>  and Galois/CounterMode (GCM), but more applications exist, including the
> following examples.There are obvious applications in hashing and pseudo
> random number generations. For exam-ple, it has been reported that hashes
> based on carry-less multiplications can outperform Google’sCityHash [17].
> >>
> >>  clmulof a number with itself inserts zeroes between each input bit. This
> can be useful for generatingMorton code [23].
> >>
> >>  clmulof a number with -1 calculates the prefix XOR operation. This
> >> can be useful for decodinggray codes.Another application of XOR
> >> prefix sums calculated withclmulis branchless tracking of
> >> quotedstrings in high-performance parsers. [16]
> >>
> >>  Carry-less multiply can also be used to implement Erasure code
> >> efficiently. [14]
> >>
> >>  ==clmul lowering without hardware support==
> >>  A 8x8=>16 clmul can also be lowered to a 32x32=>64 multiplication when
> there is no specialized instruction (also 15x15=>30, to a 60x60=>120, or if
> bitreverse is available 16x16=>32 to TWO 64x64=>64 multiplications)[3].
> >>
> >>  [1] https://en.wikipedia.org/wiki/Carry-less_product
> >>  [2] (page 30)
> >> https://raw.githubusercontent.com/riscv/riscv-bitmanip/master/bitmani
> >> p-0.92.pdf
> >>  [3] https://www.bearssl.org/constanttime.html
> >
> > What benefit would this intrinsic would bring to the middle-end IR,
> > over it's current naive expanded form?
> >
> > Note that teaching backends to produce it, or even adding it to
> > backend (ISD opcodes) and matching it in DAGCombiner has much lower
> > barrier of entry, i would suggest to start there.
> It cannot be matched.
> >
> >>  (First posted to discord
> >>
> >>  --
> >>  Shawn Landden
> >
> > Roman
> >
> >>  _______________________________________________
> >>  LLVM Developers mailing list
> >>  llvm-dev at lists.llvm.org
> >>  https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> --
> Shawn Landden
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev