[cfe-dev] [llvm-dev] Portable multiplication 64 x 64 -> 128 for int128 reimplementation

Sun Dec 30 13:45:49 PST 2018

Hi Arthur, Craig,

Thanks for you comments about GCC/Clang intrinsics. I never considered
using them, but they might be better alternative to inline assembly.
Is there a one for regular MUL?

Anyway, I want to go the opposite direction. If I can I relay on compiler's
optimizations. If I want to use MULX in Clang I do it like that:

unsigned long mulx(unsigned long x, unsigned long y, unsigned long* hi)
{
auto p = (unsigned __int128){x} * y;
*hi = static_cast<unsigned long>(p >> 64);
return static_cast<unsigned long>(p);
}

https://godbolt.org/z/PbgFb9

If compiled with -mbmi2 -mtune=generic it just uses MULX instruction.

mulx(unsigned long, unsigned long, unsigned long*):
mov rcx, rdx
mov rdx, rsi
mulx rdx, rax, rdi
mov qword ptr [rcx], rdx
ret

What I want to do it move it further - rewrite the above mulx() helper
without using __int128 type in a way that a compiler would recognize that
it should use MUL/MULX instruction.

A possible implementation looks like

uint64_t mul_full_64_generic(uint64_t x, uint64_t y, uint64_t* hi)
{
uint64_t xl = x & 0xffffffff;
uint64_t xh = x >> 32;
uint64_t yl = y & 0xffffffff;
uint64_t yh = y >> 32;

uint64_t t = xl * yl;
uint64_t l = t & 0xffffffff;
uint64_t h = t >> 32;

t = xh * yl;
t += h;
h = t >> 32;

t = xl * yh + (t & 0xffffffff);
l |= t << 32;
*hi = xh * yh + h + (t >> 32);
return l;
}

As expected, Clang is not able to match this pattern currently.

If we want to implement this optimization in Clang, there are some
questions I have:
1. Can we prove this pattern is equivalent of MUL 64x64 -> 128?
2. What pass this optimization should be added to?
3. Can this pattern be split into smaller ones? E.g. UMULH.

Paweł

On Sun, Dec 30, 2018 at 2:34 AM Craig Topper <craig.topper at gmail.com> wrote:

> _mulx_u64 only exists when the target is x86_64. That's still not very
> portable. I'm not opposed to removing the bmi2 check, but gcc also has the
> same check so it doesn't improve portability much.
>
> ~Craig
>
>
> On Sat, Dec 29, 2018 at 4:44 PM Arthur O'Dwyer via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Hi Pawel,
>>
>> There is the _mulx_u64 intrinsic, but it currently requires the hardware
>> flag "-mbmi2".
>>
>> https://github.com/Quuxplusone/WideIntProofOfConcept/blob/master/wider.h#L89-L99
>>
>> On Clang 3.8.1 and earlier, the _addcarry_u64 and _subborrow_u64
>> intrinsics required the hardware flag `-madx`, even though they didn't use
>> the hardware ADX/ADOX instructions. Modern GCC and Clang permit the use of
>> these intrinsics (to generate ADC) even in the absence of `-madx`.
>>
>> I think it would be a very good idea for Clang to support _mulx_u64 (to
>> generate MUL) even in the absence of `-mbmi2`.
>>
>> –Arthur
>>
>>
>> On Sat, Dec 29, 2018 at 6:03 PM Paweł Bylica via cfe-dev <
>> cfe-dev at lists.llvm.org> wrote:
>>
>>> Hi,
>>>
>>> For some maybe dumb reasons I try to write a portable version of int128.
>>>
>>> What is very valuable for this implementation is access to MUL
>>> instruction on x86 which provides full 64 x 64 -> 128 bit multiplication.
>>> An equally useful on ARM would be UMULH instruction.
>>>
>>> Well, the way you can access this on clang / GCC is to use __int128 type
>>> or use inline assembly. MSVC provides an intrinsic for this instruction.
>>> This defeats the idea of portable int128 reimplementation and makes
>>> constexpr implementation of multiplication at least inconvenient.
>>>
>>> Maybe there is a hope for me in LLVM. Is there any pattern matcher that
>>> is producing MUL instruction of bigger type?
>>> If not, would it be good idea to teach LLVM about it?
>>>
>>> Bests,
>>> Paweł
>>> _______________________________________________
>>> cfe-dev mailing list
>>> cfe-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20181230/ab95ee15/attachment.html>