<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/63709>63709</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            Suboptimal codegen when doing 128 bits multiply
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          moncefmechri
      </td>
    </tr>
</table>

<pre>
    https://godbolt.org/z/9PPPeqjTK

Codegen for the following code snippet ([which is a mixing step in boost.Unordered when a non-avalanching hash function is being used](https://github.com/boostorg/unordered/blob/9a7d1d336aaa73ad8e5f7c07bdb81b2e793f8d93/include/boost/unordered/detail/mulx.hpp#L111)) seems suboptimal:

```
#include <stdint.h>

uint64_t mulx64(uint64_t x)
{
    __uint128_t r = (__uint128_t)x * 0x9E3779B97F4A7C15ull;
    return (uint64_t)r ^ (uint64_t)( r >> 64 );
}
```

I believe the optimal codegen should be:

```
mulx64(unsigned long):
        movabs  rax, -7046029254386353131
        mul     rdi
        xor     rax, rdx
 ret
```

Which GCC <= 10 is able to generate (GCC >= 11 seems to regress, [which has already been reported](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110551)

Issue 1:

When compiling with any optimization level above `-O0`, clang generates the following code:

```
mulx64(unsigned long): # @mulx64(unsigned long)
        mov     rax, rdi
 movabs  rcx, -7046029254386353131
        mul     rcx
        xor rax, rdx
        ret
```

Which has a redundant move. Possibly a duplicate of [#62452](https://github.com/llvm/llvm-project/issues/62452)

Issue 2:

When using `-march=haswell` or newer, clang emits mulx. The resulting code is longer (by 1 instruction) with no clear benefit to my untrained eyes. It looks to me like the optimal codegen shared above should also be optimal for haswell and newer.

I am reporting both issues in the same bug report because they seem related enough. Let me know if you want me to split them into 2 bug reports instead.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJycVtmO27gS_Rr6pdCCRFrbgx9680VwA9wGbgZ5bJBiWWJCkR4ubTtfPyDlXuJ0MpMxGlabKtV2zqkS916NBnFD6htS3614DJN1m9maAXczDpNTK2HlaTOFsPeEXRO6JXQ7WimsDoV1I6Hbb4Ru-4eHB_zzy6f_kvKOlNfL962VOKKBnXUQJoSd1doelBlhsBLBG7XfYwBCO1LfHCY1TKA8cJjVMRn5gHtQBoS1PhR_GOskOpRwmNAAB2PNFX_impthSuYT9xPsohmCsiY5EpiOo0dJ6jtCu4saVJiiKAY7E7rNIZZq4nOcdKytSMXxVlaSsYZz3jIuO6x37VC2QoquEhTbnu062TNCt8oMOkp8dnnhT2LgShO6naM-FtN-Tyj7WFUVoT2hPXjE2YOPwu6DmrlOub5pJ2nK89_yk7JzNCDs1gepTCgmwu7fPhOVCc36MUCK2KwJ7V5OjinqYtreLP8AADw-JouKdo8BHBB2l_B5c0hofwRCr6E89vesbfubvt2ur9vbqo5aE_bGlcMQnYE3QQntHZD6_uKM0C7HuifsHpo1pKNnR6S9e7_8_P0BBGqFT5gJdu5bplcinp9s1BIE_rqTr70xWQ4StDVjTuL6tZr0me0TFx7A8SOht3DVluumpD2t16xrWM0qVl08EHW-Oqm-v3G0brmxeHLyeL7vMPyi3s9ZJf-5vU2YJ3CqMmtGaIRgYUSDjgdM_V2M7rNRdeZWsOBwdOh9Cvqiuol74NohlycQiAYc7q0LPxHOMBSjiWfxizh-U1pzQrd-sodHEcdiGBVhWyUJu6uqsq6rV6YtoHkfEaoLUD4nWQ923iuddHtQYQJuTguo6hvPstb4hBq4sE8IpCmv_pcbRG9h0NyML_X7d-bNv-cAEMqArMuf21yS5Hton6F_Yc_wu-wZjj-y5wfmnD__hEAZcHAoo5HchJQZFvBgvVdCn4CDjHuthsQku0s8IZQ1dF3Tv52kWj89X672zn7BIQ1BlRD3hG4XJ-_Rgb5Hh-gTegnombthIuxu4v6AWpOmBOvA4AHdK_w4q-DzqCvg04Tg0EcdXhaO8hkwdEke4gQVKOODi3ljpAGcOWcsDBq5A4EGdyok0cwniCY4rhLoeEJfwIcA2tqvWVMzglZffzaDeFpaC2XPA4lrb0G82qYFeS4MuJFLWcX3c47PZ1WmcoQNaVempqYNmeJ6PiOIOJ6tQODAo885nbL6waHmIeVvbBynAj5iSJl_NfYAagcnG-GQyZBHid_rVPuEMygTLNA3zn1uHHJZrOSGyZ71fIWbqum6hlFWVatpw5E2gmPXIQrW1TtJZdn361236-Sar-uV2tCSsrItm7JjVV0VZb1bi573TDDZ1WVH1iXOXOkicSlNm1UueNOwtuxXmgvUPr-3UGrwsHSD0MTQldtk_ok4erIutfLBv3oJKmjc_P9ly74gld8rpE39rWgH4syloPb6tIpOX74E_Q7tU85_BQAA__-TkOIL">