<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/82183>82183</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
Missed optimization: 128-bit remainder `%` can be implemented in terms of `divq` directly
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
Eisenwave
</td>
</tr>
</table>
<pre>
## Code to reproduce (https://godbolt.org/z/1zaM144zr)
```cpp
unsigned long long rem(unsigned __int128 x, unsigned long long y) {
return x % y;
}
```
## Actual output
```asm
rem(unsigned __int128, unsigned long long):
push rax
xor ecx, ecx
call __umodti3@PLT
pop rcx
ret
```
## Expected output
```asm
rem(unsigned __int128, unsigned long long):
mov rcx, rdx ; t0 = y
mov r8, rsi ; t1 = x.low
xor rdx, rdx ; t2.high = x.high
mov rax, rsi ; t2.low = x.low
div rcx ; t2.high = t2 % t0, t2.low = t2 / t0
mov rax, r8 ; t2.low = t1
div rcx ; t2.high = t2 % t0, t2.low = t2 / t0
mov rax, rcx ; result = t2.high
ret ; return result
```
## Explanation
Even on non-Intel architectures (where division is considered reasonably cheap), the compiler doesn't emit `div` directly.
This is unfortunate because if we only care about the remainder, there exists a robust solution that will never raise an exception (except division by zero). Consider if we wanted to compute the remainder `99 / 7` in decimal.
`99 / 7` is ` 14 R1, and our division is not allowed to produce multiple digits. However, we can take the remainder `9 % 7 == 2`, and subsequently compute `29 % 7 == 2`. This solution is based on long division and always works.
We can take the remainder of the high bits, concatenate with the low bits, and perform one more division to get the remainder.
## Possible solution
Perhaps the codegen for x86_64 should be adjusted to produce `div` instructions accordingly.
Another helpful approach would be to add inline assembly to `__umodti3` (https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/builtins/umodti3.c) to cover the x86_64 case. This would be better than delegating to `__udivmodti4`.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJy0Vk1v4zgS_TX0pRBDohxLPviQTtrYBbaBxqKBPQYUWbbYTZFqshjZ-fULUkocexxgBpgxAism6-O94qsSRQj6YBG37P4Lu39aiEid89uvOqAdxQsuWqdOW8Yrxit4dAqBHHgcvFNRIjDedERDYNUD4zvGdwenWmdo6fyB8d0r47vyVXwrV6tXz_iGFU-seJi_18X0J4dhWok2Y1FgnD1MXx57xpv3jednbankDRwZf4Qb9ifGN8DqL1NAAACPFL2FIzB-DydWzVusfrqCcYFt4vsgKQoDLtIQ6cpchH5a-QTibYCpBtXDGV36DDF0Gak4Xm4cnc9PlJluekz7UhiTNp6fY-8U6Yqtiu__-XEV1g356d_dPF6TuMH563FASaj-ada9e8nQ-CN4dQRg1RegAlj1BKcbljmyD3qyK7PdcWnc-MeSeXUVlS87fehml_Tvjfji-Jbgzce4ET5Lo3QGP_-6TkI8i42KFHKO9L6-S-uf5m_gIn_2Km_n_huzyrlSHkM0NHvdqJRHgvNn8sjdNTn-KXEZYQVpZz_ufn1BC86Cdfbu35bQgPCy04SSoseQpszYocfEXQftLOgA0tmgFXpU4FEEZ0VrTiA7FEPSWypChyBdP2iDHpTDYBmvCbDXBGxdKP3C1gUo7VGSOS0_IvrR6ZCSRLt3nqIVhNCiFDEg6D2MCM6mbMIjiNZFysk89kJbhX7O7hHwqAMFEOBdGwNBcCYm-kCdIBi1MWDxBT14oQOCsIBHiUM2YbyZfpx5tyd4Re8Y3yzhcS7ADGgUNrUtucw5El5CSow3myyFOtHWFhRK3QuzvJrKF1Yh-UG5gv-WiZSwaTD4i4OwjkAY48Yp-9uroY-G9GDSoR00hSX8y42JaQozIkhhgcSvWyizjuskw6REnqQ0pw6xDfg7oqVU_JkmWxf8hs8S8iG-F1wHaEVIg81Ok-mdQ4oszChOAUbnf4UlfKzI_z7F6vZ5IXdgqykklNJZKQizYEZNXbZIrfhmkJIN6PfO9-AsQu8-6pocHPBKTMsbjfTdhaBbg-_0Ptp8R9-JIcz6V3hAC3vn4disn9crCJ2LRkGLINTPGOjy2M6NoW0gH2WKHkBI6bzS9nDVKA_WJaFDh2bYRwNiGLwTsoPxLQk5EEqBtkZbBBEC9qlRyaVU5zfYurhxmdDUxXYpXc_4zpiXt8fd4N1PlMT4rjWuZXyXSsX47q3b73zaMzpttVEb0jYwvptzLWW6JeRGSZ2XyjSXRoqAs27e8bdIlK1E6hiDB0HaHs74lX7JYVdJcwu1rdSm2ogFbsu6aHi9ud-Ui2673-_rclUXrSpwXW1Euyo3YlU3iPtVvWnahd7ygq8KXjZlU6yrclmsUWJTy7Kular3nK2KpAizTBVI96uFDiHituFlUy2MaNGEfIvj3OIIeZNxni51fpur1sZDYKvCpIl0jkKaDG6_6ZB7YyDd69dpQlcPUPLmrtV02aCM36fjSl3RIuh-MNhjHj7aAqHvQ2qNSUi_P47YRfRm-5ePODNJx5eZ_j8AAP__P1VHAA">