[cfe-dev] [llvm-dev] Portable multiplication 64 x 64 -> 128 for int128 reimplementation
Paweł Bylica via cfe-dev
cfe-dev at lists.llvm.org
Wed Jan 2 12:27:54 PST 2019
Thanks again for all the comments.
As suggested, I created my first pattern match in AggresiveInstCombine:
Suggestions welcome (preferably in the review).
On Mon, Dec 31, 2018 at 10:41 PM Craig Topper <craig.topper at gmail.com>
> On trunk we never generate MULX. We used to blindly use it anytime bmi2
> was enabled, but its a longer encoding and isn't a guaranteed register
> allocation improvement. So I took it out a few weeks ago. We need a more
> precise heuristic for when to use it.
> LLVM trunk will never generate ADCX/ADOX either. This was removed in
> September. We used to inconsistently generate them when adx was enabled
> unless we could use the RMW form or the immediate form of ADC. But that
> didn't really make any sense. The only reason to use ADCX or ADOX is when
> you want to carefully manage the flags to have two interleaved dependency
> chains. But that would require a special analysis to determine when to do
> that and we don't have that.
> On Mon, Dec 31, 2018 at 1:21 PM Arthur O'Dwyer <arthur.j.odwyer at gmail.com>
>> On Sun, Dec 30, 2018 at 4:46 PM Paweł Bylica <chfast at gmail.com> wrote:
>>> Hi Arthur, Craig,
>>> Thanks for you comments about GCC/Clang intrinsics. I never considered
>>> using them, but they might be better alternative to inline assembly.
>>> Is there a one for regular MUL?
>> I'm not sure, but I think there currently does not exist any intrinsic to
>> generate the top half of a 64x64=128 multiply, except for `_mulx_64`.
>> If Clang stopped requiring `-mbmi2`, I would then expect the `_mulx_64`
>> intrinsic to generate a regular MUL instruction; similar to
>> how_addcarry_u64 generates ADCX/ADOX when available/useful and a regular
>> ADC otherwise.
>> MSVC calls this intrinsic `_umul128
>> and on MSVC it does generate a regular MUL instruction rather than forcing
>> Anyway, I want to go the opposite direction. [...] mulx() helper without
>>> using __int128 type in a way that a compiler would recognize that it should
>>> use MUL/MULX instruction.
>>> A possible implementation looks like [SNIPPED]
>> Interesting trivia: There are at least three ways to write the final
>> "return" statement in this function. Clang generates different code for
>> each one of them. If someone does pursue writing an InstCombine
>> optimization for this, it would be good to generate the same efficient code
>> for all three versions.
>> https://godbolt.org/z/-Cozee (LLVM IR: https://godbolt.org/z/_1pDoz)
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the cfe-dev