<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/77174>77174</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            Unnecessary register shuffling and saving to stack
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            llvm:regalloc,
            missed-optimization
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          Kmeakin
      </td>
    </tr>
</table>

<pre>
    The following code is generated when compiling `Vec::<u32>::push()` for Rust on AArch64 (godbolt). Similar code is generated for `std::vector<uint32_t>::push_back()` for C++ on AArch64, and on x86-64, so I don't think it is a Rust or AArch64 specific issue. I just use AArch64 because that is the assembly I am most familiar with.

([compiler explorer](https://godbolt.org/z/WGaK3qGb5))
```asm
example::push:
 str     x30, [sp, #-32]!      ; save LR to stack (unnecessary)
 stp     x20, x19, [sp, #16]   ; save x20 and x19 to stack (unnecessary)
 ldr     x8, [x0, #16]         ; load `vec.len` to x8
        mov x19, x0               ; save x0 to x19 (unnecessary)
        ldr     x9, [x0]              ; load `vec.cap` to x9
        mov     w20, w1 ; save `value` to w20 (w1 will be clobbered by line 11)
        cmp x8, x9                ; compare `vec.len` against `vec.cap`
        b.ne .LBB2_2               ; branch if not equal
        mov     x0, x19 ; restore x0 from x19 (unnecessary)
        mov     x1, x8 ; place param `vec.len` in x1 for call to `reserve_for_push` (unnecessary)
        bl alloc::raw_vec::RawVec<T,A>::reserve_for_push
        ldr     x8, [x19, #16]        ; reload `vec.len`, as it may have been changed by `reserve_for_push`
.LBB2_2:
        add     x10, x8, #1           ; calculate new value of `vec.len`
        ldr     x9, [x19, #8]         ; load `vec.ptr` to x9 
        str     w20, [x9, x8, lsl #2] ; store `value` to `vec.ptr + vec.len * 4`
        str     x10, [x19, #16]       ; write the new value of `vec.len`
        ldp     x20, x19, [sp, #16]   ; restore x20 and x19 from stack (unnecessary)
        ldr     x30, [sp], #32        ; restore LR from stack (unnecessary)
 ret
```

This code performs unnecessary register shuffling:
* on line 5, `x0` is saved into `x19`, then it is restored on line 10
* on line 11, `x8` is moved into `x1` to line up the parameters for the call to `reserve_for_push`,     but it could have been loaded directly into `x1` on line 4

This code also performs unnecessary saves/restore to/from the stack:
* LLVM was smart enough to use `x9` and `x10` as temporaries, since `r9..r15` are caller-saved (https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#general-purpose-registers)
* But for some reason LLVM chose to use `x19` and `x20` as temporary registers, even though they are callee-saved and must be saved to the stack
* If `x11` and `x12` were used instead, there would be no need to save/restore to the stack
* Even if the use of callee-saved registers was absolutely necessary for some reason, it would be better to place the save/restore code around the call to `reserve_for_push` and take it off the hot path (in the common case, `vec.len == vec.cap` and there is no need to reallocate). I think LLVM has a `shrink-wrap` pass to do this

The optimal assembly should look something like this:
```asm
example::push:
        mov w11, w1               ; save `value` to w11 (w1 will be clobbered by the next line)
        ldr     x1, [x0, #16]         ; load `vec.len` to x1
        ldr     x9, [x0]              ; load `vec.cap` to x9
 cmp     x1, x9                ; compare `vec.len` against `vec.cap`
 b.ne    .LBB2_2               ; branch if not equal
        bl alloc::raw_vec::RawVec<T,A>::reserve_for_push
        ldr     x8, [x19, #16]        ; reload `vec.len`, as it may have been changed by `reserve_for_push`
.LBB2_2:
        add     x10, x8, #1           ; calculate new value of `vec.len`
        ldr     x9, [x19, #8]         ; load `vec.ptr` to x9 
        str     w11, [x9, x8, lsl #2] ; store `value` to `vec.ptr + vec.len * 4`
        str     x10, [x19, #16]       ; write the new value of `vec.len`
        ret
```
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzsWFtv6zYS_jX0y8CGRPn64Ic4qYuDpi_Zs93HgJLGFhuKVEnKcvrrF0PK8iVuzulegMWignGOrZAz38x8M_wk4Zzca8Q1m23Y7GkkWl8Zu_6pRvEm9Sg35fv6a4WwM0qZTuo9FKZEkA72qNEKjyV0FWooTN1IRQvYPPkFC5Y9hM9jm3GW_RB_Nq2rGF8yvmLzBHbGwkvrPBgNDw-2qOZTYHy5N2VulGd8NYG_yVoqYe84pc1snjhfRtMHLLyx5E9qn_FXf-X0NRfF27XnR8Y3jG8unDP-CEKXdOe4nI_jDWfgC5RGM77w4Cup30B6giJ67HbA7hos5E4WIJ1rcQJf4Fda0TocluRYCPrtKxGM-ApBOId1rt7hC4gaauM87EQtlRQWOumrCUueWPLQ_8uXbLaJyUYLeGyUsWjZ7InxZeV94yhovmV82-dxYuye8e3vjG__8aP4Kfvtx3xGaeCr3uI8iR_h6ngHj6JuFF6ULOu9g_MW6DpmCSWHzTauCV94Ns54QJGGBcCyDThxQHh-AW_AeVG8UXVbrbFA54R9HyCA8000y4PZY7q6tZ7O2ezp0uyRJ6FYx3T1bfuq7GEve7vH5MbuGbQyoiRmHbCYKNREF29oZzTVX7U5nGAeE7i-zhjj1nT1h8D6a8C3OuO7gHUXWyGaE7bVR2x0dTGbXXpGRHuFarHf2fGEoHUpdFIpyBEKZfIcLZaQv4OSGiFNP8At6qZP5XEFd4InegqLN1kUeyG189f4rw3nE40wed5s-Cu_Yze3QhcVyB1o4wF_a4W6H_nxRKOwzaLzxoZy7Kypv6cgg6E0GFoGO40SBUIjrKhvQpMajmmYKoVQijLL5olFh_aArztjX0MXzZNvuc0VCKVMPzyt6F4Pp0n6IrowVR-_Mv74MAy3D07u82rgfd9ZN8SPWfrI_DARHQ28WrxDRQzKkYZ9JfQ-cuR-oBFGX8nz-OgvUZZ9dmOdlidMtzwSqmiV8AgaOwjEBbO7gfiNRhoCXn7W6I23QzPBtcnTyOu7iWyuzqiVU2ScRl_sssC0mzY7OwE6cXrwwPgDTD-EMIzYNPm0aOSts9JjOES-P0F_ZtIOnXMxbEMLfT5ub4txeVjQERH8ZPyafdHT88v3OLDobw6vyzPyayVdVAwN2p2xtYMLK2BxL51HC65qdzuSLANBqSJGx7k3CzjnyTEJHe7CAC1B6lhQylxsEE_yJ2qCPopyMJImHw2n6cnysrdcm2vLPW3C6rYJ9Q1TBz1aF8YM3fp81JCPMFJaT-AK06ryooOJ_FhCKS0WXr3fOD9Bnd5Pq1DO3M8t5cgxvj2V0xvGt6GghDgU9SrZz8-__AydcOBqYT2gNu2-oqBIJBGcINeIegFbKIVw4LFujBVWkrNHcFIXYbldTSY2nYVVNmYI7TgW7qM-kr5q80lhasa3Dy8_j53Z-U5YZHwrcjkWgvFtrkzO-LYWUtNt0RSOdOHp28Q6z3gWZakaN61tjMPxiWLurLL4A2xaH4rnTI1gUTijYwKKypAmPEedXoXNb8M-cziEjwfU4KuYugrfz7FjHztZqkmL5tjT2JuLigwIv-yi-_Qq65x-dWiR8BFPnUdR9tS3CF3gVo6gDWiMxsnLFQ_uufuBgMtd-BOFbnbXsIcoA0VE7oxqPap3OBPuJp8ESvozohw9dbo3_eEdQFxDi4y2ptXld7RVyIoXb0huzC5ir4yHRviKOCZ1tGLq2mgohMO-24epnz2x7AkuJJyInm14wrnIocUgBoTH8Cz0pX_8CJSpKCHhAaiyUr-NOxttNcI52ltSwqW77l8E03hZC3V-5nBVSJUy5i3kkVzsQck3jPuHZv0zTwkXIqqL065L78i5u4o0TT9TpPGoO_ownv74wEn_ZZmf_leEOQnmM7D_lGgOahng3xHMfwnO6xr8TwnONP0_EJwfpdqoXGflKluJEa7TRTKdpdN0lo2q9Wy2KKaz6SpfTvN8kfCpmOZpkmOyXKWLpEhGcs0TPk3SZJ5Mk1WaTWbTJc4W5TxbZCVPi5JNE6yFVBOlDvXE2P0ovIpZLxbpYjpSIkflwpsuzmlFoPM-NgDnjD8yzmvpHJbjMCjl78JLOlQo2yO7pj3jvN07Nk2UdN6d_XjpFa7__qnMDIPeiQN9Pb2yGLVWrT_RJQFm_G_cWPMrFp7xbYiKdFYI7J8BAAD__8scwMk">