<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/82196>82196</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
`_BitInt(4096) % uint32_t` computation is 430x slower than naive software implementation
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
Eisenwave
</td>
</tr>
</table>
<pre>
```cpp
using u64 = unsigned long long;
using u32 = unsigned;
using big = _BitInt(4096);
u32 rem_fast(big x, u32 y) {
constexpr int size = sizeof(x) / 4;
u32 digits[size];
__builtin_memcpy(digits, &x, sizeof(digits));
u32 rem = 0;
for (int i = 0; i < size; ++i) {
u64 temp = u64(rem) << 32 | digits[size - i - 1];
rem = temp % y;
}
return rem;
}
u32 rem_slow(big x, u32 y) {
return x % y;
}
```
When benchmarked, `rem_slow` is 430x slower: https://quick-bench.com/q/cC9UfXgSKqVSRYR1JmnZVghJOTI
![image](https://github.com/llvm/llvm-project/assets/22040976/146b0a74-d922-4b48-9f31-b5cb9c113c1d)
The naive software implementation is still 1.3x faster when using `_BitInt(128)`.
The underlying issue is that if you're just computing the remainder, you don't need a full integer division. There is a simple O(n) algorithm because you can go through the number digit by digit, which is dramatically faster than whatever clang is doing here.
See also https://godbolt.org/z/KhPjPW3ed
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJyUVU2P2zYQ_TX0ZWCDGsmydNAh3q2BJIcEyTZpezEoaSRxQ5EOSa3t_PqC9Nd6C7SosFhLnsc382beyMI52Wuiii3XbPk4E5MfjK1-k470XrzQrDbtsWI5P_01ux3jj4y_m5zUPUx5Bix9hElHlhaU0X38x9L1HTDFO-CbcC37GN6upX-vPcMi42XOsLzizugUwdK47YQLoHDswPAh0h8ZlsBWZzwAQGO083TYWZDag5O_KCYJN6ZjWBziCdxAdk0TmVrZS-_Ych2QbPl4iwLAdltPUnmptyONze7IsDjj8QEY5rGea4pLqHytBc4yYjX8jr0zFhgWoV55DcfbE2l4YrhmuJb_kBuuMBBP4-7U7DxjWFgaIzR9CCRhDquHe4kwBwlzSN5KDdelzBMpLuF4g7DV4w1syU9WB_xtZpf4_fCcMvv_Ht6Z7_Am6Y3zYsnT4_eBNNSkm2EU9ge1cRo5v-bLOUgHWcoPEJ7JsvQdDN7vHEvfMdww3PycZPNjHjkWjRnDNww3zUP5e_dH__Xjz29fv_z5Jfkw6r--9cOHT0_vX4tjmLDlWo6ij47B4p67l36Y6jOtUi-Xj_nOmmdqPMONcI6CVTaIPOPlKme4SbK85mKVzdsScZ7VWTEvuzSZ18umLpskSZukDc56VcjTQKCFfCFwpvN7YQnkuFM0kvbCS6NDH5yXSkGySA8QVoks7EP_TsvIcn7bwwSLkCDni7dJJt2SVcdwQjo3UeD1g_AgOziaieHKEjxPzkNjxt3kA9APFCwgZDgbRnQ0E7RGM1x50EQtCOgmpcLCUk8WWvkinTR6AU8D2ZhDgIuC4BPDQgffCNUbK_0wQk2NmBxF2kZo6A34wZqpH2JqPY11JO2lh_p4ugll7AfZDIG8tWIUXjZCqeOlNX4QGvaD8PRCFholomJoTVAUqrprzVciEMqZN-bqTVsb5RfG9gw3vxhuPg6fnz9_T6mdtVXalmkpZlQlK16kfFXwfDZUvOuWSdFhkixLwmYlkrRcirJBTDjP8mQmK-SYcUxKnizzLF-U2NU5FnkmsM14krGMh26rRfBayD2Lo6oKTMp8pkRNysU3P6Km_WmODDH8ENgq-rOeescyrqTz7sbipVdU3Rnl_MKO6zpJ7VPc-rB1p-FfnfdqA0-N_Vevziarqv-9SVFG2KQo8-8AAAD__3p4ENU">