<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/58212>58212</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [ARM] divmod decomposition prevents __aeabi_idivmod
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          easyaspi314
      </td>
    </tr>
</table>

<pre>
    LLVM will always decompose `div` + `rem` on ARM to `div` + `mul` + `sub` when optimizations are enabled.

This makes sense on targets with division, but on targets without division it breaks the conversion to `__aeabi_[u]idivmod`:
```llvm
define void @divmod(i32 %num, i32 %den, ptr %out0)  {
  %quo = udiv i32 %num, %den
  %rem = urem i32 %num, %den
  store i32 %quo, i32 * %out0, align 4
  %out1 = getelementptr i32 *, ptr %out0, i32 1
  store i32 %rem, i32 * %out1, align 4
  ret void
}
```
With `--target=armv5te-none-eabi -O0`:
```asm
divmod:
    push    {r11, lr}
    sub     sp, sp, #8
    str     r2, [sp, #4]
    bl      __aeabi_uidivmod
    ldr     r2, [sp, #4]
    str     r0, [r2]
    str     r1, [r2, #4] 
    add     sp, sp, #8 
    pop     {r11, pc}
```
With `--target=armv5te-none-eabi -O3`:
```asm
divmod:
    push    {r4, r5, r6, lr}
    mov     r4, r2
    mov     r5, r1
    mov     r6, r0
    bl      __aeabi_uidiv
    mul     r1, r0, r5
    sub     r1, r6, r1
    stm     r4, {r0, r1}
    pop     {r4, r5, r6, pc}
```
This is because llvm will "decompose" the div+rem into this:
```llvm
define void @divmod(i32 %num, i32 %den, ptr nocapture writeonly %out0) local_unnamed_addr #0 {
  %num.frozen = freeze i32 %num
  %den.frozen = freeze i32 %den
  %quo = udiv i32 %num.frozen, %den.frozen
  %1 = mul i32 %quo, %den.frozen
  %rem.decomposed = sub i32 %num.frozen, %1
  store i32 %quo, ptr %out0, align 4
  %out1 = getelementptr ptr, ptr %out0, i32 1
  store i32 %rem.decomposed, ptr %out1, align 4
  ret void
}
```
If this optimization didn't occur it would emit something much cleaner, without spilling r0 and r1.
```asm
divmod:
    push    {r4, lr}
    mov     r4, r2
    bl      __aeabi_uidivmod
    stm     r4!, {r0, r1}
    pop     {r4, pc}
```
Additionally, `udiv+urem` without optimizations already generates `udiv+mls` on targets with idiv, so this optimization pass is detrimental either way.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJy1VtuO2zYQ_Rr5hbBBUfLtwQ_rLAIUSFCgKNpHgxLHNhNKVHmx4f36DinZkm_JposasiRqboczw0MWWpxWX7789ZUcpVKEqyM_WSKg1FWjLZBkRoU84J0kbB1GBqow0jV5-eMrcfpeo_JqMLK-CKPjHmqiGycr-cad1LUl3ACBmhcKxCShrwl9ae9_7qUlFf8OllioEQPGctzswFkE6fYEw0mLLhL2iRTe3co1fjqrEOlIYYB_t8TtgZS6PoCJghb5ZsOBF3KTTNc-mb5KtKu0QEGSdWjCe7yUOlTtJwFbWQM5aClIktPOhi1kxnDW09pXAVk3EhBxNs6EEWKjCVsSkszXrTMSPv_jEU32Sjz6ItduOhe9Lhag1Q0vP9K1TmOGOw2M0IN66aF8wpLLXU3yQQSUpDEEphQUVFC7AL-zvZtN6zV9GDZ0y13Y9EFYAy4mtEv6_PUm--3w71B-HI3HbcERJDfVYepgXOsaxqGWZPw7fVhAbs_1awt2ViD4a7zdhyeWxaQRnzIXEEGAbUziswnC9p6wbDHQwJyEn2FRNl1flHLsrF6vUFGNnFvPn5vuoqHE-zxdItJODw0eytNe3vshvSIX4uHkBiqNbsh1gpryg1XKPlSlPGAw03ifPShYpQ_t5FtF9kDSGqcPJNGhoT8p2sDQq0Gm23qg-7vu6eSz28DWVQOwYXq0UxlOaViDu-k_L0fkU7wKKLlHOg1E1rJ9wtiF6fE9UmSYF1tHbqmRIx0a_x9cWOuSNw45jByNdKBrdRryo0Kx2vi65hWIDfZn4JuM3tAmep9sjX7DrSXQ1dYAvMGQE3tVDP1c9Zpgn5BxZ97z7PlDb9myZmiGa9Z9po9ZnlwKIKJx6JRnMR9TbBfjhpLfzer4_1VGH2C-Nv3vxP7bNnba1QkBW1Hg1Oe4wZelN2ErP2qvBIEKX62uAE3qHea73JNSAa8hTuV8BrAN9nhQMJTwWuBqmnyEa97NLz8n-OFyT39pxT9f5S9CyJA1rtQpupxR365l3x3Zznm5OYUpPB6JE3YFpo87PHT1lpWy3Vnv6vwVuS9sFPpB0RpuI9sIcEaGJuOKAFqBIXi0nIxglc5m05SynC1HYpWJZbbkIyedghVuUXioDHtTm7HLOVS2rg0c0KG9ZLZL7Mgbtdo710SmYp_x2mFEX0zQGgeRrdrHuDH6G5QOh9JaDxZfpguWstF-NS_TeZ7lU54z2AKdL_OcL7K5YIuSTzmfjxQvQNmAEiGO5IpRxlJK55SyOV1OipQXZVqUJd0u0iydIR1CxaWahMATbXYjs4oYCr-zKFTSOtsLMW-4cADO_rnHcpkVcHvi2MpZmo8i5FXE-y_BfFbU">