<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/82183>82183</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            Missed optimization: 128-bit remainder `%` can be implemented in terms of `divq` directly
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          Eisenwave
      </td>
    </tr>
</table>

<pre>
    ## Code to reproduce (https://godbolt.org/z/1zaM144zr)

```cpp
unsigned long long rem(unsigned __int128 x, unsigned long long y) {
    return x % y;
}
```

## Actual output
```asm
rem(unsigned __int128, unsigned long long):
        push    rax
        xor     ecx, ecx
 call    __umodti3@PLT
        pop     rcx
 ret
```

## Expected output
```asm
rem(unsigned __int128, unsigned long long):
        mov rcx, rdx  ; t0 = y
        mov r8, rsi ; t1 = x.low
        xor rdx, rdx  ; t2.high = x.high
        mov rax, rsi  ; t2.low  = x.low
        div rcx       ; t2.high = t2 % t0, t2.low = t2 / t0
        mov rax, r8   ; t2.low = t1
        div rcx ; t2.high = t2 % t0, t2.low = t2 / t0
        mov rax, rcx  ; result = t2.high
        ret           ; return result
```

## Explanation

Even on non-Intel architectures (where division is considered reasonably cheap), the compiler doesn't emit `div` directly.

This is unfortunate because if we only care about the remainder, there exists a robust solution that will never raise an exception (except division by zero). Consider if we wanted to compute the remainder `99 / 7` in decimal.

`99 / 7` is ` 14 R1, and our division is not allowed to produce multiple digits. However, we can take the remainder `9 % 7 == 2`, and subsequently compute `29 % 7 == 2`. This solution is based on long division and always works. 

We can take the remainder of the high bits, concatenate with the low bits, and perform one more division to get the remainder.

## Possible solution

Perhaps the codegen for x86_64 should be adjusted to produce `div` instructions accordingly.

Another helpful approach would be to add inline assembly to `__umodti3` (https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/builtins/umodti3.c) to cover the x86_64 case. This would be better than delegating to `__udivmodti4`.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJy0Vk1v4zgS_TX0pRBDohxLPviQTtrYBbaBxqKBPQYUWbbYTZFqshjZ-fULUkocexxgBpgxAism6-O94qsSRQj6YBG37P4Lu39aiEid89uvOqAdxQsuWqdOW8Yrxit4dAqBHHgcvFNRIjDedERDYNUD4zvGdwenWmdo6fyB8d0r47vyVXwrV6tXz_iGFU-seJi_18X0J4dhWok2Y1FgnD1MXx57xpv3jednbankDRwZf4Qb9ifGN8DqL1NAAACPFL2FIzB-DydWzVusfrqCcYFt4vsgKQoDLtIQ6cpchH5a-QTibYCpBtXDGV36DDF0Gak4Xm4cnc9PlJluekz7UhiTNp6fY-8U6Yqtiu__-XEV1g356d_dPF6TuMH563FASaj-ada9e8nQ-CN4dQRg1RegAlj1BKcbljmyD3qyK7PdcWnc-MeSeXUVlS87fehml_Tvjfji-Jbgzce4ET5Lo3QGP_-6TkI8i42KFHKO9L6-S-uf5m_gIn_2Km_n_huzyrlSHkM0NHvdqJRHgvNn8sjdNTn-KXEZYQVpZz_ufn1BC86Cdfbu35bQgPCy04SSoseQpszYocfEXQftLOgA0tmgFXpU4FEEZ0VrTiA7FEPSWypChyBdP2iDHpTDYBmvCbDXBGxdKP3C1gUo7VGSOS0_IvrR6ZCSRLt3nqIVhNCiFDEg6D2MCM6mbMIjiNZFysk89kJbhX7O7hHwqAMFEOBdGwNBcCYm-kCdIBi1MWDxBT14oQOCsIBHiUM2YbyZfpx5tyd4Re8Y3yzhcS7ADGgUNrUtucw5El5CSow3myyFOtHWFhRK3QuzvJrKF1Yh-UG5gv-WiZSwaTD4i4OwjkAY48Yp-9uroY-G9GDSoR00hSX8y42JaQozIkhhgcSvWyizjuskw6REnqQ0pw6xDfg7oqVU_JkmWxf8hs8S8iG-F1wHaEVIg81Ok-mdQ4oszChOAUbnf4UlfKzI_z7F6vZ5IXdgqykklNJZKQizYEZNXbZIrfhmkJIN6PfO9-AsQu8-6pocHPBKTMsbjfTdhaBbg-_0Ptp8R9-JIcz6V3hAC3vn4disn9crCJ2LRkGLINTPGOjy2M6NoW0gH2WKHkBI6bzS9nDVKA_WJaFDh2bYRwNiGLwTsoPxLQk5EEqBtkZbBBEC9qlRyaVU5zfYurhxmdDUxXYpXc_4zpiXt8fd4N1PlMT4rjWuZXyXSsX47q3b73zaMzpttVEb0jYwvptzLWW6JeRGSZ2XyjSXRoqAs27e8bdIlK1E6hiDB0HaHs74lX7JYVdJcwu1rdSm2ogFbsu6aHi9ud-Ui2673-_rclUXrSpwXW1Euyo3YlU3iPtVvWnahd7ygq8KXjZlU6yrclmsUWJTy7Kular3nK2KpAizTBVI96uFDiHituFlUy2MaNGEfIvj3OIIeZNxni51fpur1sZDYKvCpIl0jkKaDG6_6ZB7YyDd69dpQlcPUPLmrtV02aCM36fjSl3RIuh-MNhjHj7aAqHvQ2qNSUi_P47YRfRm-5ePODNJx5eZ_j8AAP__P1VHAA">