<div dir="ltr"><div dir="ltr">Hi Arthur, Craig,<div><br></div><div>Thanks for you comments about GCC/Clang intrinsics. I never considered using them, but they might be better alternative to inline assembly.<br>Is there a one for regular MUL?<br><br>Anyway, I want to go the opposite direction. If I can I relay on compiler's optimizations. If I want to use MULX in Clang I do it like that:<br><br><div style="background-color:rgb(255,255,254)"><div style="color:rgb(0,0,0)"><span style="color:rgb(0,0,255)">unsigned</span> <span style="color:rgb(0,0,255)">long</span> mulx(<span style="color:rgb(0,0,255)">unsigned</span> <span style="color:rgb(0,0,255)">long</span> x, <span style="color:rgb(0,0,255)">unsigned</span> <span style="color:rgb(0,0,255)">long</span> y, <span style="color:rgb(0,0,255)">unsigned</span> <span style="color:rgb(0,0,255)">long</span>* hi)</div><div style="color:rgb(0,0,0)">{</div><div style="color:rgb(0,0,0)"> <span style="color:rgb(0,0,255)">auto</span> p = (<span style="color:rgb(0,0,255)">unsigned</span> <span style="color:rgb(0,0,255)">__int128</span>){x} * y;</div><div style="color:rgb(0,0,0)"> *hi = <span style="color:rgb(0,0,255)">static_cast</span><<span style="color:rgb(0,0,255)">unsigned</span> <span style="color:rgb(0,0,255)">long</span>>(p >> <span style="color:rgb(9,136,90)">64</span>);</div><div style="color:rgb(0,0,0)"> <span style="color:rgb(0,0,255)">return</span> <span style="color:rgb(0,0,255)">static_cast</span><<span style="color:rgb(0,0,255)">unsigned</span> <span style="color:rgb(0,0,255)">long</span>>(p);</div><div style="color:rgb(0,0,0)">}</div><div style="color:rgb(0,0,0)"><br></div><div><font color="#000000"><a href="https://godbolt.org/z/PbgFb9" target="_blank">https://godbolt.org/z/PbgFb9</a><br></font><br>If compiled with -mbmi2 -mtune=generic it just uses MULX instruction.</div><div><br></div><div><div style="color:rgb(0,0,0)"><div><span style="color:rgb(0,128,128)">mulx(unsigned long, unsigned long, unsigned long*):</span></div><div> <span style="color:rgb(0,0,255)">mov</span> <span style="color:rgb(72,100,170)">rcx</span>, <span style="color:rgb(72,100,170)">rdx</span></div><div> <span style="color:rgb(0,0,255)">mov</span> <span style="color:rgb(72,100,170)">rdx</span>, <span style="color:rgb(72,100,170)">rsi</span></div><div> <span style="color:rgb(0,0,255)">mulx</span> <span style="color:rgb(72,100,170)">rdx</span>, <span style="color:rgb(72,100,170)">rax</span>, <span style="color:rgb(72,100,170)">rdi</span></div><div> <span style="color:rgb(0,0,255)">mov</span> <span style="color:rgb(0,128,128)">qword</span> <span style="color:rgb(0,128,128)">ptr</span> [<span style="color:rgb(72,100,170)">rcx</span>], <span style="color:rgb(72,100,170)">rdx</span></div><div> <span style="color:rgb(0,0,255)">ret</span></div></div></div><div><br></div><div>What I want to do it move it further - rewrite the above mulx() helper without using __int128 type in a way that a compiler would recognize that it should use MUL/MULX instruction.<br><br>A possible implementation looks like</div><div><br></div><div><div style="color:rgb(0,0,0)"><br><div>uint64_t mul_full_64_generic(uint64_t x, uint64_t y, uint64_t* hi)</div><div>{</div><div> uint64_t xl = x & <span style="color:rgb(48,48,192)">0xffffffff</span>;</div><div> uint64_t xh = x >> <span style="color:rgb(9,136,90)">32</span>;</div><div> uint64_t yl = y & <span style="color:rgb(48,48,192)">0xffffffff</span>;</div><div> uint64_t yh = y >> <span style="color:rgb(9,136,90)">32</span>;</div><br><div> uint64_t t = xl * yl;</div><div> uint64_t l = t & <span style="color:rgb(48,48,192)">0xffffffff</span>;</div><div> uint64_t h = t >> <span style="color:rgb(9,136,90)">32</span>;</div><br><div> t = xh * yl;</div><div> t += h;</div><div> h = t >> <span style="color:rgb(9,136,90)">32</span>;</div><br><div> t = xl * yh + (t & <span style="color:rgb(48,48,192)">0xffffffff</span>);</div><div> l |= t << <span style="color:rgb(9,136,90)">32</span>;</div><div> *hi = xh * yh + h + (t >> <span style="color:rgb(9,136,90)">32</span>);</div><div> <span style="color:rgb(0,0,255)">return</span> l;</div><div>}</div></div></div><div><br></div><div>As expected, Clang is not able to match this pattern currently. </div><div><br></div><div>If we want to implement this optimization in Clang, there are some questions I have:<br>1. Can we prove this pattern is equivalent of MUL 64x64 -> 128?<br>2. What pass this optimization should be added to?<br>3. Can this pattern be split into smaller ones? E.g. UMULH.<br><br>Paweł</div><div><br></div><div><br></div></div></div></div></div><br><div class="gmail_quote"><div dir="ltr">On Sun, Dec 30, 2018 at 2:34 AM Craig Topper <<a href="mailto:craig.topper@gmail.com" target="_blank">craig.topper@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">_mulx_u64 only exists when the target is x86_64. That's still not very portable. I'm not opposed to removing the bmi2 check, but gcc also has the same check so it doesn't improve portability much.<div><br clear="all"><div><div dir="ltr" class="gmail-m_-4850544347226011626gmail-m_-5851588304871567555gmail-m_-3206145263580675406gmail_signature">~Craig</div></div><br></div></div><br><div class="gmail_quote"><div dir="ltr">On Sat, Dec 29, 2018 at 4:44 PM Arthur O'Dwyer via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Hi Pawel,</div><div><br></div><div dir="ltr">There is the _mulx_u64 intrinsic, but it currently requires the hardware flag "-mbmi2".</div><div dir="ltr"><div><a href="https://github.com/Quuxplusone/WideIntProofOfConcept/blob/master/wider.h#L89-L99" target="_blank">https://github.com/Quuxplusone/WideIntProofOfConcept/blob/master/wider.h#L89-L99</a><br></div><div><br></div><div>On Clang 3.8.1 and earlier, the _addcarry_u64 and _subborrow_u64 intrinsics required the hardware flag `-madx`, even though they didn't use the hardware ADX/ADOX instructions. Modern GCC and Clang permit the use of these intrinsics (to generate ADC) even in the absence of `-madx`.</div><div><br></div><div>I think it would be a very good idea for Clang to support _mulx_u64 (to generate MUL) even in the absence of `-mbmi2`.</div><div><br></div><div>–Arthur</div><div><br></div></div></div><br><div class="gmail_quote"><div dir="ltr">On Sat, Dec 29, 2018 at 6:03 PM Paweł Bylica via cfe-dev <<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Hi,<div><br></div><div>For some maybe dumb reasons I try to write a portable version of int128.</div><div><br></div><div>What is very valuable for this implementation is access to MUL instruction on x86 which provides full 64 x 64 -> 128 bit multiplication. An equally useful on ARM would be UMULH instruction.</div><div><br></div><div>Well, the way you can access this on clang / GCC is to use __int128 type or use inline assembly. MSVC provides an intrinsic for this instruction. This defeats the idea of portable int128 reimplementation and makes constexpr implementation of multiplication at least inconvenient.</div><div><br></div><div>Maybe there is a hope for me in LLVM. Is there any pattern matcher that is producing MUL instruction of bigger type?</div><div>If not, would it be good idea to teach LLVM about it?</div><div><br></div><div>Bests,</div><div>Paweł</div></div>
_______________________________________________<br>
cfe-dev mailing list<br>
<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a><br>
</blockquote></div>
_______________________________________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
</blockquote></div>
</blockquote></div>