[cfe-dev] [llvm-dev] Portable multiplication 64 x 64 -> 128 for int128 reimplementation

Mon Dec 31 13:20:47 PST 2018

On Sun, Dec 30, 2018 at 4:46 PM Paweł Bylica <chfast at gmail.com> wrote:

> Hi Arthur, Craig,
>
> Thanks for you comments about GCC/Clang intrinsics. I never considered
> using them, but they might be better alternative to inline assembly.
> Is there a one for regular MUL?
>

I'm not sure, but I think there currently does not exist any intrinsic to
generate the top half of a 64x64=128 multiply, except for `_mulx_64`.
If Clang stopped requiring `-mbmi2`, I would then expect the `_mulx_64`
intrinsic to generate a regular MUL instruction; similar to
how_addcarry_u64 generates ADCX/ADOX when available/useful and a regular
ADC otherwise.
MSVC calls this intrinsic `_umul128
<https://docs.microsoft.com/en-us/cpp/intrinsics/umul128?view=vs-2017>`,
and on MSVC it does generate a regular MUL instruction rather than forcing
MULX.

Anyway, I want to go the opposite direction. [...] mulx() helper without
> using __int128 type in a way that a compiler would recognize that it should
> use MUL/MULX instruction.
> A possible implementation looks like [SNIPPED]
>

Interesting trivia: There are at least three ways to write the final
"return" statement in this function. Clang generates different code for
each one of them. If someone does pursue writing an InstCombine
optimization for this, it would be good to generate the same efficient code
for all three versions.
https://godbolt.org/z/-Cozee (LLVM IR: https://godbolt.org/z/_1pDoz)

–Arthur
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20181231/8ec6396b/attachment.html>