<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/82196>82196</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            `_BitInt(4096) % uint32_t` computation is 430x slower than naive software implementation
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          Eisenwave
      </td>
    </tr>
</table>

<pre>
    ```cpp
using u64 = unsigned long long;
using u32 = unsigned;
using big = _BitInt(4096);

u32 rem_fast(big x, u32 y) {
    constexpr int size = sizeof(x) / 4;
 u32 digits[size];
    __builtin_memcpy(digits, &x, sizeof(digits));
 u32 rem = 0;
    for (int i = 0; i < size; ++i) {
        u64 temp = u64(rem) << 32 | digits[size - i - 1];
        rem = temp % y;
 }
    return rem;
}

u32 rem_slow(big x, u32 y) {
    return x % y;
}
```
When benchmarked, `rem_slow` is 430x slower: https://quick-bench.com/q/cC9UfXgSKqVSRYR1JmnZVghJOTI

![image](https://github.com/llvm/llvm-project/assets/22040976/146b0a74-d922-4b48-9f31-b5cb9c113c1d)

The naive software implementation is still 1.3x faster when using `_BitInt(128)`.

The underlying issue is that if you're just computing the remainder, you don't need a full integer division. There is a simple O(n) algorithm because you can go through the number digit by digit, which is dramatically faster than whatever clang is doing here.

See also https://godbolt.org/z/KhPjPW3ed
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJyUVU2P2zYQ_TX0ZWCDGsmydNAh3q2BJIcEyTZpezEoaSRxQ5EOSa3t_PqC9Nd6C7SosFhLnsc382beyMI52Wuiii3XbPk4E5MfjK1-k470XrzQrDbtsWI5P_01ux3jj4y_m5zUPUx5Bix9hElHlhaU0X38x9L1HTDFO-CbcC37GN6upX-vPcMi42XOsLzizugUwdK47YQLoHDswPAh0h8ZlsBWZzwAQGO083TYWZDag5O_KCYJN6ZjWBziCdxAdk0TmVrZS-_Ych2QbPl4iwLAdltPUnmptyONze7IsDjj8QEY5rGea4pLqHytBc4yYjX8jr0zFhgWoV55DcfbE2l4YrhmuJb_kBuuMBBP4-7U7DxjWFgaIzR9CCRhDquHe4kwBwlzSN5KDdelzBMpLuF4g7DV4w1syU9WB_xtZpf4_fCcMvv_Ht6Z7_Am6Y3zYsnT4_eBNNSkm2EU9ge1cRo5v-bLOUgHWcoPEJ7JsvQdDN7vHEvfMdww3PycZPNjHjkWjRnDNww3zUP5e_dH__Xjz29fv_z5Jfkw6r--9cOHT0_vX4tjmLDlWo6ij47B4p67l36Y6jOtUi-Xj_nOmmdqPMONcI6CVTaIPOPlKme4SbK85mKVzdsScZ7VWTEvuzSZ18umLpskSZukDc56VcjTQKCFfCFwpvN7YQnkuFM0kvbCS6NDH5yXSkGySA8QVoks7EP_TsvIcn7bwwSLkCDni7dJJt2SVcdwQjo3UeD1g_AgOziaieHKEjxPzkNjxt3kA9APFCwgZDgbRnQ0E7RGM1x50EQtCOgmpcLCUk8WWvkinTR6AU8D2ZhDgIuC4BPDQgffCNUbK_0wQk2NmBxF2kZo6A34wZqpH2JqPY11JO2lh_p4ugll7AfZDIG8tWIUXjZCqeOlNX4QGvaD8PRCFholomJoTVAUqrprzVciEMqZN-bqTVsb5RfG9gw3vxhuPg6fnz9_T6mdtVXalmkpZlQlK16kfFXwfDZUvOuWSdFhkixLwmYlkrRcirJBTDjP8mQmK-SYcUxKnizzLF-U2NU5FnkmsM14krGMh26rRfBayD2Lo6oKTMp8pkRNysU3P6Km_WmODDH8ENgq-rOeescyrqTz7sbipVdU3Rnl_MKO6zpJ7VPc-rB1p-FfnfdqA0-N_Vevziarqv-9SVFG2KQo8-8AAAD__3p4ENU">