<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/137946>137946</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[WebAssembly] Result unnecessarily stored in two locations
</td>
</tr>
<tr>
<th>Labels</th>
<td>
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
Photosounder
</td>
</tr>
</table>
<pre>
I found something strange while looking at a Wasm binary compiled from C with the latest Clang. This function:
```c
typedef struct { double x, y; } xy_t;
xy_t mul_xy(xy_t a, xy_t b)
{
a.x *= b.x;
a.y *= b.y;
return a;
}
```
compiles, using `-msimd128 -Oz`, to this:

(the squares on the left are the bytes from the .wasm, the grey rectangle is their interpretation, the purple rectangle is their C decompilation and the blue rectangle is the stack height change)
or in WAT format:
```
mul_xy:
local.get 1
local.get 2
v128.load 0:p2align=3
local.get 1
v128.load 0:p2align=3
f64x2.mul
v128.store 0:p2align=3
local.get 0
local.get 1
v128.load 0:p2align=3
v128.store 0:p2align=3
end_function
```
In Wasm `mul_xy` now has 3 arguments as pointers, `local0` points to where the result is actually wanted, `local1` points to vector `a` and `local2` points to vector `b`. The problem is that whereas you'd expect the f64x2 multiplication to be stored directly at `local0`, it's instead first stored at `local1`, then the vector at `local1` is reloaded to be stored at `local0`. So we needlessly do some extra memory operations plus we force the caller to handle how the value of `a` will be modified when that's not what we want (although it seems that the values get copied to the stack regardless).
If I modify the C function to this:
```c
xy_t mul_xy(xy_t a, xy_t b)
{
xy_t c;
c.x = a.x * b.x;
c.y = a.y * b.y;
return c;
}
```
then it compiles to the correct expect Wasm:
```
mul_xy:
local.get 0
local.get 1
v128.load 0:p2align=3
local.get 2
v128.load 0:p2align=3
f64x2.mul
v128.store 0:p2align=3
end_function
```
So there's something wrong with how the compiler doesn't realise that the two versions of the C function are functionally the same.
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJysVk1v4zwO_jXKhYghy4mTHHJIGxToaRc7A8xxIFu0rR1Z8kp0E--vfyHZ6fd8HCYI2kgkRYp8-FAyBN1axCPb3rHteSVH6pw__rtz5IIbrUK_qpyajo_QxCUE1yN12rYQyEvbIlw6bRCMcz_iriSQ8E2GHiptpZ-gdv2gDSpovOvhHi6aOqAOwUjCQHBvpG0z-NrpAM1oa9LOsuLEePqWfP7WjJ9oGlBhEx2PNQHb3YFyY2UQrkzcw8SKO2C7M1yn78SKu_mEuIB-NN-vExP7tJJRO_2qmDhEtd2sfJDZFZg4seIMVXa9nXGQ2fSyPT1ve6TRW5DLxu78OuDZ-3L5ED2OIeaHlXzdB92rXOxh_a__R1VxD-SAOh1eLi5ytr177GWLbHtmYt8RDUksHph4aDV1Y5XVrmfiYQzo15JI1l2PlgITDzIETD-qcqe4KNV6w5GvN7zcr_cH3K8F5oXKiz1u97gkQexjVcL_RukxgLNzkbAhkB7TopoIw1zHuMwuMvQp-A6h9TiBx5qkbQ2CDnFXe9CW0A8eSabCLtrD6AeDn-nfg8I5a8kApFWzbzN-1IdAsv4BHeq2I6i7iMflNvzkonP4dvoKjfO9pI-gYvy0ICOJwLhamqxFgvmTx83l814mXsmecrHPjJMKOCtOg5BGt5YV5-IX9vmn9vPnk1OacnMVWT-a92aBnMefmn1wzv9eSH_uHq36_tzaH3rk0c58wUq-VKPkYN0FOhmgAOnbMcEaZIDBJTylfmIlTzeIJ82CENvo0uECV49hNBShImsapTETXKQlVK-t87fWT1iT81EqoyCi76YpfqZZsZJHAkMYvKsM9jM6Jc2xyACTG5nYKcDrgDWl4FJBIzGRHoyuZ7CTgyqC2nlUoHSEu5kipb6-a4xeExO7ANoGQqmg0T7Qze6Ven5jlw7ndl6CfqcS4_UYq43qbQzvXGfwxcEFwSIqgyGYCZRLIwHwSl5Cj73zE7gBfbpRgMGMIZo0ztdzWWppDProp5NWGYTOXebgZOxx1zxn_6KNicH0TulGo4r5tCmz6fbWxQzHNGMqLDCxl4Y6N7YdaIKA2C-FeD4-QMR77QY9X_WFRDy20qdbMXHIFmg28Dh7n5Lm_fOE-kjYryfVnw-dW4skSf08Wuo4iIozLAPpzTSq4zRKsmmRfRhJ9S9GUsKCpttcDrcs1M5HvN0wGlvy94z5Eybhv2PTP6CXD0d_zri_s39DnH-Lsr6klHlMMHx5EV28i3_jA-eG6SXLHpTDYJnYEXiURgd8ASZdIpv4kPrFNe-RFqfvbZFILGFW9pit1LFQh-IgV3jMd5uS87LM-ao7NjIXYlNsqiKveLHl-1254yKvq7KqG1kWK30UXGz5puD8sOU5z5Dvys0G822xLWRTbtmGYy-1yYx56jPn25UOYcRjXuwOm3JlZIUm3B6N_hi11tXYBrbhRgcKL3akyaTn5TesTiFgX5mJbc_wn5mdR2uxxhCk12a60Y62KSmx9olFVqM3x1-8gKKz5d968O6_WBMTDyni-Ahagn46in8CAAD__31hWxo">