<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/56425>56425</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            Missed optimization: Redundant copy when passing a pointer to a by-value struct arg
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          hhusjr
      </td>
    </tr>
</table>

<pre>
    See [https://godbolt.org/z/vo1Kvd6MT](https://godbolt.org/z/vo1Kvd6MT) for a self-contained example.
For the following C code,
```c
struct sa {
    char buffer[24];
};

void process(const struct sa *data);

static inline void func_inline(struct sa sa) {
    process(&sa);
}

void call_inline(struct sa sa) {
    func_inline(sa);
}
```
It seems that in the optimal assembly output of `call_inline` function, the address of the `sa` argument of it can be directly passed to `process` function. However, the compiler emits assembly like:
```asm
call_inline:                            # @call_inline
        sub     rsp, 24
        mov     rax, qword ptr [rsp + 48]
        mov     qword ptr [rsp + 16], rax
        movaps  xmm0, xmmword ptr [rsp + 32]
        movaps  xmmword ptr [rsp], xmm0
        mov     rdi, rsp
        call    process
        add     rsp, 24
        ret
```
In the output, the whole sa struct is copied from 32(%rsp) to (%rsp). A slight change<sup>[1]</sup> can demonstrate that it's a missed optimization. If we add `__attribute__((noinline))` to `func`, `call` will tail-call with just `jmp func`, not copying any args there. **If it's legal to eliminate the arg copy when doing tail-call, it would be also legal to do that when inlining `func`.**<sup>[1]</sup>
```c
__attribute__((noinline)) static void func_noinline(struct sa sa) {
    process(&sa);
}

void call_noinline(struct sa sa) {
    func_noinline(sa);
}
```
```asm
call_noinline:                          # @call_noinline
        jmp     func_noinline                   # TAILCALL
func_noinline:                          # @func_noinline
        push    rax
        lea     rdi, [rsp + 16]
        call    process
        pop     rax
        ret
```

[1] https://stackoverflow.com/questions/72859532/why-do-clang-and-gcc-produce-this-sub-optimal-output-copying-a-struct-for-pass
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJytVk1zozgQ_TX4osLlgPHHwQfHs6lNbeayO_eUQI1RRiBGH3Y8v367BY6N7WR9WAobhNTdr5vXT-RaHFb_ALAoe6yca22UrqPkCc-tFrlWbqzNFke_8bfTD3_txOz7jyj7FiWL-5cnS1ZqwzizoMq40I3jsgHB4J3XrYJxNPkWTdZPuMRVgEuV0nvZbNmGFVpAlGy6BdFs0p1FN7bO-MIxy1k0f-weMTyKihuW-7IEg0klU0Kb9vPR_Ow-_O-0FKw1ugBrMSkEZ9HlyXOyFtxxTOHCzjruZMFkozAVFryUvileuwfo6eTDkvkQ4ylglMzs0D1CvMRXcKXu9HwB4hPXx0p2w2d0BlBbLD93mFJ4Dbp1suaKcWuhztWBae9a75guGb2CM0SzSYjqpG7wXQVjLoTB_GgxDXEJIsF13Gx9DU3wIh3m1bAcmJAGCochWgommNNkcazRmfsx-1PvYYcvto9T6LqVCgyDWjp7wqrkTyBqDrPltu6enKNPQ9k-O6IkZdF0kO5HqemwPg9XY1vChGwbTNd6103zd5r-tdcG2eYMtRuaoPtHNl0QQ2-a3Vz_MAv9twlOL814axl7r-sJLcDrLQdpcivg0fLCoo8VXN5OTcgABpcO5qlk51QfTCI_viqbAXebqD01AxOPHNhXWkHohq4vpEVatBJ5VBpdU7bUZVkItQzcOhuP2ZpZJbeVI91otsiHjfVtlP6B6T8E6diglnWPAl8F1KQRhjvo-wWRzJF7rJaBvaFx5G_eEfa5ZPvQD0Tp11funJG5d_D6GmAsGn1s1iWdSPaO_UR5yhuT7NuN5vYSi4rqqeJQ3r10FXvzqFg4-Va37Myq0Y7qcCAh5c2BWo8aHAyMSdXwfC6P2BVssdMxMCiE3nSpAZkEF1hiwMQ1ufoITjGwhffaK0FNzJXVJ0dCd8UJliFDMj7lNe4gfFXt26L_nyVkvTKfJPk0__-L8r2-r4DcJcyfKNeHn6-061y5PgwGXUaEucL2iacf6-eXzfrlpfMwzOYOFEODAYrW2yr0_KWaKeDnAnOlgPerTau7RK9CfKYz_X_HSTb80EF6FT817kElfqWMcQPCZ788WGp35M7TPFlky4xU52lfHWKh40KhsMS8EfG2KGJEKHwBsaukjXH7iPuNNu5ULe57NuZxR6gYP51i2hhHYpWKZbrkIyedgtX3a7mhV_E3CN8I3riz1iXzoANYCtk43C-xRTnLD_GOKw9H5cR-H3mjVhdfdigyPu8zVWp3vFAmb7hv4xCRYAXwJptNk2xUrZaTfDrP8sVs-ZBmIoEsK6GcplgaMVtMSj5SPAdlV1jiKEka2LPgAu-x4CO5SiZJMplP5g_pdD6djZPpNM_TtCgKkeeCc2QU1ChEY8JBn5wjswqQcr-1OKmkdfY0SdlvG4AQDv1z7yptVlXl7ZsZhcirgPxfwJFIKQ">