<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/120339>120339</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            Redundant Copying of Large Struct Parameter to Stack When Passed to Another Function
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          jonathan-gruber-jg
      </td>
    </tr>
</table>

<pre>
    When passing a large struct as an argument to a function, and the calling function already has the large struct as a parameter, Clang redundantly copies the struct parameter to the stack.

A minimal test case is in the attached file test.c.txt (GitHub would not allow me to upload it with the .c extension, sadly), reproduced below for your convenience:
```
struct S {
        void *x, *y, *z, *w;
};

extern int extern_func(struct S);

int tail_call(struct S x) {
        return extern_func(x);
}

int non_tail_call(struct S x) {
        return ~extern_func(x);
}
```

I only tested the target architectures x86_64, aarch64, and riscv64, but I would not be surprised if other target architectures exhibit the same inefficiency.

Host system: Arch Linux, x86_64.

Clang version: official Arch Linux package of clang, version 18.1.8-4.

Command line to reproduce results: clang -c test.c --target=<arch> -O<opt-level>

x86_64 assembly (Intel syntax), with -Oz, -Os, -O2, or -O3
```
tail_call:
    push rbp
    mov  rbp,rsp
    pop rbp
    jmp  extern_func

non_tail_call:
    push   rbp
    mov rbp,rsp
    sub    rsp,0x20
    movaps xmm0,XMMWORD PTR [rbp+0x10]
 movaps xmm1,XMMWORD PTR [rbp+0x20]
    movups XMMWORD PTR [rsp+0x10],xmm1
 movups XMMWORD PTR [rsp],xmm0
    call   extern_func
    not    eax
 add    rsp,0x20
    pop    rbp
    ret
```

aarch64 assembly, with -Oz, -Os, -O2, or -O3
```
tail_call:
    sub sp, sp, #0x30
    stp x29, x30, [sp, #32]
    add x29, sp, #0x20
    ldp q0, q1, [x0]
    mov x0, sp
 stp q0, q1, [sp]
    bl  extern_func
    ldp x29, x30, [sp, #32]
    add sp, sp, #0x30
    ret

non_tail_call:
    sub sp, sp, #0x30
    stp x29, x30, [sp, #32]
    add x29, sp, #0x20
    ldp q0, q1, [x0]
    mov x0, sp
    stp q0, q1, [sp]
    bl  extern_func
    mvn w0, w0
    ldp x29, x30, [sp, #32]
    add sp, sp, #0x30
    ret
```

riscv64 assembly, with -Oz, -Os, -O2, or -O3
```
tail_call:
    addi  sp,sp,-48
    sd ra,40(sp)
    ld    a1,24(a0)
    ld    a2,16(a0)
    ld    a3,8(a0)
 ld    a0,0(a0)
    sd    a1,32(sp)
    sd    a2,24(sp)
    sd a3,16(sp)
    sd    a0,8(sp)
    addi  a0,sp,8
    auipc ra,0x0
    jalr ra # extern_func
    ld    ra,40(sp)
    addi  sp,sp,48
 ret

non_tail_call:
    addi  sp,sp,-48
    sd    ra,40(sp)
    ld a1,24(a0)
    ld    a2,16(a0)
    ld    a3,8(a0)
    ld    a0,0(a0)
 sd    a1,32(sp)
    sd    a2,24(sp)
    sd    a3,16(sp)
    sd a0,8(sp)
    addi  a0,sp,8
    auipc ra,0x0
    jalr  ra # extern_func
 not   a0,a0
    ld    ra,40(sp)
    addi  sp,sp,48
    ret
```

Only the tail call for x86_64 is optimized semi-correctly, save for the pointless register and stack manipulation prior to the unconditional branch to extern_func.

Please let me know if I should include anything else in this bug report.

[test.c.txt](https://github.com/user-attachments/files/18172937/test.c.txt)
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJzMV1Fv2z4O_zTKC-HAkZMmechDmq63AhtabAfs3grZYmJ1suRJcuLs4T77gbLTpGm67e76B_5AUTki-SP1I2mLwnu1MYgLNrlmk5uBaEJp3eLJGhFKYZKNa3J0ydNmkFu5X3wr0UBNRmYDArRwGwQfXFMEEB6EAeE2TYUmQLAgYN2YIihrGF-BMBJCiVAIrcn8IAOhHQq5h1L4qPAKFWrhRIUBHeGstDAbcCgbI4UJeg-FrRV2tr3VswGF0e2L4vuQpUuWLpdQKaMqoSGgD1AIj6A8KBM1RQiiKFHCWmmMGsNiGNoAjM_-ocLHJoedbbQEYwMIre0OKiQ3Ta2tkKAC7FQoI9awAGwDGt9T4IXUe8bn9OywdlY2BUrIkVDW1sHeNg4Ka7ZoFJoCWUYBs6u0_0uX_QG_Aptex-PMt1ZJYHzZEirjy32__uzXHcui5vSmf0iXFJQzoEyA7vGRksH47IBOMR6USSsIpR8pcSc60DI-P4bhMDTOnOG1R6DpzRHOWPP455D__g3mCTssXd6BNXofE4ddwQWqpwDCFaUKWITGoYd2dvV4NY51SYL-0Uhwyhfb7mfeBLg7yXaO4BtXO-VRglqDDSVV2CV4bEuVq9DVnqgQlMH1WhWU1X1fhx-tD-D3PmDFsiUsXVHCJ2WamMguvl6zK_ktulhI2RJsxBL6xAhqUXwXGwS7hoL0CaU3gdFsOBrOkmc8W1V0Vq1MLN3nWgSHvtHBk48IAknR9wAkSXdSlt2wbEXHZdkHSO5ZtrJ1SDRuUbPsQ-ehCx-E91jlek_Nc2cCavB7E0Tb90BslOQ-Vmpy77uF02IdJPfZWXaPJRPbAgCgbnwJLq_7n5XdQvzJV84fNmtbn6g8VTW8qNIY78uKfAkPZw7O8X2T00IbfJW2PD3qitpDW1Up46t_ff787f7LDTz88wuwyXUEuU7bUcomVMUn2qO3tflBu4Nvag_nmv4Ul68iXgf_hvZB7RA2MQDnFJGAWoAEoqUNIeXFUxPbLyhzGF53ad90zwXyPtVAmYjx9P8Zz9I2O0TmQw1tV3i0SfLJ9bNixo_U0tl6zROg5yNqWcOPCPBj1MO0LxMDbdobp8vo90y9o71Tz_UlssnHnwf79qF79t-s8L8nZb3r_5q1amtgF4126V9D5Msy7j8W71zGQkoFXRDxXzKeHUiR4ATjq3FKX82aXqOHU0ZDYoqPGZ-J9JWMQhhdXZZljK9mp6J-n_hKz0z80VXGz8LwR1cxjDNZ9BNjuGCU9jGcijomoigyceBBNKouOirS9pChJ6EdOEF5u9hQMYsX2TsnvOP7d63zyzS96UvLd0zTUfQqU_9Hmg6uLmXqndL0Rp66T0wEEun_nLi3uvU-XgzjjVDp7jtHl-7-rqI82DqoSv1ECR4rlRTWOSxC19ZebDGqE0BtlQkavQeHG-Vp0KD7VJwyoBJG1Y0WcbipnbLPU0hjCmukIoHQkDthipJkJyz0N7QHjTSWaAw0XXw3dkcXzjvwZbyNKlPoRiIIsw8lDVOoaYihCUZ5yBuaj2rrQo_GJtfHQSZ-8GdlCDVd8xi_Zfx2o0LZ5MPCVozfNh5d0o1BNMl5xm9pFqJ1NBtN-TybMn57AsjnA7nI5DybiwEuRtNsPJ7x6SwdlIur2XyE-TTj60mxnmDOi6t8ncpJnuYcMc8HasFTPh7x0SxNx1c8HcpRxsUU83Uh5XyKIzZOsRJKD7XeVkPrNgPlfYOLEU-zbD7QIkft4_jKucEdRCnj9F4fuAUZJXmz8WycauWDP8IEFTQuvhzmSFjZek9U2jV8ihPo124qeTidJb_GDMcx-IFe-pI2l6abBG77iXbQOL34BcEUQr8ktbNPWATGb2PgkeTuZNsF_08AAAD__9FckAs">