<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/63663>63663</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
llvm missed optimization: memcpy swap
</td>
</tr>
<tr>
<th>Labels</th>
<td>
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
gonzalobg
</td>
</tr>
</table>
<pre>
Discovered when working on a different programming language based on LLVM.
Consider swapping elements of two stack allocated arrays with a for-loop ([godbolt](https://clang.godbolt.org/z/bYPsxovP7)):
```c++
std::array<int, N> a, b;
unknown(a); unknown(b);
for (int i = 0; i < N; ++i) {
int tmp = b[i];
b[i] = a[i];
a[i] = tmp;
}
unknown(a, b);
```
The IR after optimizations is just:
```llvm
define dso_local void @test()() local_unnamed_addr {
entry:
%a = alloca %"struct.std::array", align 16
%b = alloca %"struct.std::array", align 16
call void @unknown(std::array<int, 4ul>&)(ptr noundef nonnull align 4 dereferenceable(16) %a) #3
call void @unknown(std::array<int, 4ul>&)(ptr noundef nonnull align 4 dereferenceable(16) %b) #3
%0 = load <4 x i32>, ptr %a, align 16
%1 = load <4 x i32>, ptr %b, align 16
store <4 x i32> %0, ptr %b, align 16
store <4 x i32> %1, ptr %a, align 16
call void @unknown(std::array<int, 4ul> const&, std::array<int, 4ul> const&)(ptr noundef nonnull align 4 dereferenceable(16) %a, ptr noundef nonnull align 4 dereferenceable(16) %b) #3
ret void
}
```
Now consider swapping elements with memcpy ([godbolt](https://clang.godbolt.org/z/ffGEzf6MW)):
```c++
std::array<int, N> a, b;
unknown(a); unknown(b);
{
std::array<int, N> tmp = b;
b = a;
a = tmp;
}
unknown(a, b);
```
which produces the following IR after optimizations:
```llvm
define dso_local void @test()() local_unnamed_addr {
entry:
%a = alloca %"struct.std::array", align 4
%b = alloca %"struct.std::array", align 4
%tmp.sroa.0 = alloca [4 x i32], align 4
call void @unknown(std::array<int, 4ul>&)(ptr noundef nonnull align 4 dereferenceable(16) %a) #4
call void @unknown(std::array<int, 4ul>&)(ptr noundef nonnull align 4 dereferenceable(16) %b) #4
call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 4 dereferenceable(16) %tmp.sroa.0, ptr noundef nonnull align 4 dereferenceable(16) %b, i64 16, i1 false)
call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 4 dereferenceable(16) %b, ptr noundef nonnull align 4 dereferenceable(16) %a, i64 16, i1 false)
call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 4 dereferenceable(16) %a, ptr noundef nonnull align 4 dereferenceable(16) %tmp.sroa.0, i64 16, i1 false)
call void @unknown(std::array<int, 4ul> const&, std::array<int, 4ul> const&)(ptr noundef nonnull align 4 dereferenceable(16) %a, ptr noundef nonnull align 4 dereferenceable(16) %b) #4
ret void
}```
I think there is a missing optimization that should have removed the ` %tmp.sroa.0 = alloca [4 x i32], align 4`, to essentially produce very similar code to that of swap'ing elements with the for loops.
LLVM frontends having to pick when to lower trivial type aggregate copy assignment with memcpy vs for-loops for subsequent optimizations to trigger puts IMO an unnecessary burden on the frontends.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzcV19v2zYQ_zT0y6GCTMmy_eCHOKmHAm1XDMOGPRUn8SSxoUiNpOw6n34glcR267RFXCzYAMGiyTvyd3_0Ox46JxtNtGKzNZvdTHDwrbGrxug7VKZsJqUR-9WNdJXZkiUBu5Y07Iy9lboBowFByLomS9pDb01jsevCkkLdDNgQlOhIBMm3b_94l7D0hqVX10Y7KciC22HfB3FS1JH2DkwNfmfAeaxuAZUyFXoSgNbi3sFO-hYQamNfKWN6YHzBZuvGiNIoz2Y3jC9a73vHsivGN4xvqoAjuRdIjG0Y39wxvin_-uA-m-2HOePL8GRXI7L73yIdn4rxdXjirPMiyGVXEQzLrqX2jF_De5a9BgyjkmX3soO-1WanGV9g3H4Nh5lynBkFa2ODFVJ7kMCyG0iDcBjGjdcwIpCML4HN75UAgrzv-qhRstlaBuOzx-WHqbiOX6_j8brv-sclNr85Y0Cw7Ajzo3uOffZ7S_DmN8DakwXTe9nJO_TSaAfSwafB-aecrNS2G6cE1VITCGc-hsAr2BopgOWpJ-dDrEOwwi_E5Y-D1tiR-IhC2IN3SHu7fzwMgPEZjo6I6RT-M86dt0Plky-CynmwFpVsNEyLox3Ky3aoUB2sOfj2qZTKB8Wy14wXo8W9t6DNoAXVoI3Wg1L3J-QgyFL8_irCUhHji2kRk4XPcHxn2YuiKL9Ewfgsjd5UBkVI9Bw-g8x4POsawjEj-LOBmH5ftTyn6ryxdKoSkTxTb_o9qM_yNVRGh0wvwtQPi16WIqMZPyOwlny0-AsqOUsX780uWvBEGYhE31FX9fvnk3xd__L6ri7e_fnSJH9E3N_c_8Doj1R9zzvH3P0TSXvXyqoNdVsMFTnwLUFtlDK7EIzzbP5fY_H8UhI_3sB3feKswSQ92Wm2fiCHkJ1fKb4o-b8sivKbKELOJONXnvRpeGSRP_-wQ3QuY7VrkEUeqDyMplCjchS88C-YUF6AHF8U-SWV5DRwP27C_7qs5k-X1XNU_gZ8K_Vt4HBL4cqN0EnnYpt2xN_gW_TgWjMoAS1uCSx1ZksCIvuzIn0GzxUxbt4AOUfaS1Rq_1BVYEt2D052UqGFyggKghGFqWPlZ3z-de0fS5GF0Oa55NjQ0EdCbY32pIULNgRtb6CX1e3Yn3oDyuzIgrdyK1GB3_cE2DSWGvQElen3MLa-4ciT28bWPbaXcQRuKB39PQS507YmmGFl05CFfvAO3rz7FVDDoDVV5BzaPZSDFaQhup0OoJOJWGVimS1xQqtpsVjkRbbki0m7KgrKBRW0nBe5KHmWcqzmS6LZdFFPi2k1kSue8iydp3m6zGd8kcwznPKaF6JaTDHLS5an1KFUSfy-jW0m0rmBVkVWFNlEYUnKPXT7dhWEXpVD4wIhSOfdQc1LrygKxEQKTfyR-Sy7evBYiCFMBqtWp9eyRvp2KJPKdIxv4vVgfL3qrflElWd8E6E5xjcR3T8BAAD__wJu_Mo">