<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/129432>129432</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
powerpc: failure to optimize manual `vec_nmsub` implementation
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
folkertdev
</td>
</tr>
</table>
<pre>
given this code
https://godbolt.org/z/3WxTM4Yao
```c
#include <altivec.h>
vector float old(vector float a, vector float b, vector float c) {
return vec_nmsub(a, b, c);
}
vector float new(vector float a, vector float b, vector float c) {
return vec_neg(vec_madd(a, b, vec_neg(c)));
}
```
on newer powerpc cpus, these both generate the exact same assembly as expected:
```asm
xvnmsubasp 36, 34, 35
vmr 2, 4
blr
```
however for older cpus, the non-intrinsic implementation fails to optimize
```asm
old:
vnmsubfp 2, 2, 3, 4
blr
new:
vspltisb 5, -1
vslw 5, 5, 5
vsubfp 4, 5, 4
vmaddfp 2, 2, 3, 4
vsubfp 2, 5, 2
blrasm
```
this came up here https://github.com/rust-lang/stdarch/pull/1734
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJyslM9v6ygQx_8afBklwuAf8cGHtF3f9rbSak8VxhObfRgswE7av_4JO31N2r6eXoSIPHwHPswwI7xXvUGsSf5A8qdEzGGwrj5Z_QNd6HBJWtu91L1a0EAYlAdpOyT0SOhxCGHyhB8Jawhretu1Voe9dT1hzSthDf_38s_f2X_CbnJS0G3I-MG4MlLPHQLhj0IHtaDcD4T_tYkXlME6OGkrAljdEXa4MwnCHuHO0n6ySMIqIOUDoUcAAIdhdiZKns3o55aww7rL6hi1hEcpKZ--QDB4_sMI2G8bPo-i6-5Y3pdXqm3csr0FcuO0JtKhg8me0U0S5DT7uE8Y0CO0NgzQo0EnAkYb4EXIAF6MCMJ7HFv9AsIDXiaUAbuY0fuECT9eLxB_l2UNn_AT8CKew7N1zqNmGd0qYtGU3Xi12n1GH-wZF3Rwsi4mGd0tOxhrdsoEp4xXEtQ4aRzRBBGUNXASSnsIFuwU1Khe8Uvm-HLW67xhbOynaQNcJ_4bVHqMSb_39pMOyreQR59derekz5v5OsVgbEdlv6y3pywx79-DLDeo-Sajx4i3Xe5DMLfqjFmdJxjQIXwoUBWGud1LOxLWuNmHnRYm1qoPnXByIKyZZq0Ja9KSR4ykq3lX8UokWKdlRqu0yvMiGeoqLbCQoiyKND0IlOJwYIfq0BYia7u0rBJVM8pyyilLWZbnxb7MME8lp0WV0aKsJMkojkLpvdbLGFtGoryfsU5ZlXGWaNGi9mtPYszgGdZVwlhsUa6OTrt27j3JqFY--Pdtggoa62slEH5c38ns8PalwCjMLDSQgr43g4J-eGHJ7HT9TQTjkde_3eTs_ygDYc0K6mMMt5ssNfsZAAD__zu4m88">