<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/63090>63090</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
ARM Clang: underoptimization: expected vfma.f16, actual vcvtb-vfma.f32-vcvtb
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
pmor13
</td>
</tr>
</table>
<pre>
This code:
```
__fp16 mul(__fp16 x, __fp16 y, __fp16 z)
{
return x * y + z;
}
```
compiled as:
```
clang -O3 -mfpu=fp-armv8 -march=armv8.2-a+fp16
```
produces the following assembler code:
```
mul:
vmov.f16 s0, r2
vmov.f16 s2, r1
vcvtb.f32.f16 s0, s0
vmov.f16 s4, r0
vcvtb.f32.f16 s2, s2
vcvtb.f32.f16 s4, s4
vfma.f32 s0, s4, s2
vcvtb.f16.f32 s0, s0
vmov r0, s0
bx lr
```
Here we see `vcvtb-vfma.f32-vcvtb` while a single `vfma.f16` is expected.
Demo: https://godbolt.org/z/K9hj6GE8d.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJyEU92OmzwQfRpzMwL5JyFwwUWy-fJVqqpKVe9XxgzBK4ORbZJsnr7CIf2Jsi1C2MMcnzlz5JHe6-OAWJH1jqz3iZxCZ1019tYxkdS2ea--d9qDsg0SsSV0T-iW5HR5Y_j62o4sh34yhBdLcCH8BZb9-2_7K-HlQrLZ3TYAAA7D5Aa4AOFbeAfCd3AlYndH7p_WVbYftcEGpP9ImjJyOEL6VUDat-NExL4dU-n6UwFpL53qiNjHMOOpJHw3a3xKNDrbTAo9hA6htcbYsx6OIL3Hvjbo_mrQ7Mw9Bctz6u0pa1l-jz2dXXL8A5TnMc0e0uoU6qwVfGG6kXj6j1KryPWIeuSKFf2joEdU5PKrBXVqezln_-xqAT2nYvly4EH8rDpejac91ZfbatxTxz-hQzgjeEQgOY2l0ru4NIYkp3DutEGQ4PVwNDdkxLB8zmoPeBlRBWyypUr87rG3RGyhC2GMN48fCD8cbVNbEzLrjoQfroQfPpfdW_7_f0WTJU0lmlKUMsGK5cW6YExQlnRVWyom5QaZKrHcSFXXebGpixXdqDWyViS64pQLmlPB8vWasixXVNKyLNYoUNHNiqwo9lKbzJhTPxdPtPcTVrmgJU2MrNH4ONycD3iGmCScz7PuqvlMWk9HT1bUaB_8L5agg8Fq--0LvMwzNPc7DQ06Owbd66sM2g7zz7tD8NM5_gJShUkaeGZ7MjlTPTinQzfVmbI94YdZwLKko7NvqALhhyjbE36Ibf0IAAD__-26T0g">