<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/61218>61218</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
ARM MVE - VFMAS instruction never generated if the scalar is constant
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
kjbracey
</td>
</tr>
</table>
<pre>
The VFMAS instruction is quite rarely used compared to VFMA, and when it is applicable it will often be used with a constant scalar, eg in a Newton-Raphson inverse square root approximation step:
x = x * (1.5 - 0.5 * x * x) // Can use VMUL ; VFMAS ; VMUL
MVE gives us two VFMA forms (`V * V + V`, `V * S + V`), and the VFMAS form `V * V + S`. A key difference is which input register is modified - for VFMA it's the addend, while VFMAS writes back to one of the multiplicands.
Clang can generate VFMAS either from the `vfmasq` intrinsic, or from a `float32x4_t * float32x4_t + float32_t` expression, but **only** if the scalar is not a known constant. If the scalar is constant, it is always loaded into a vector register, and it uses the all-vector `VFMA`, even though this inevitably means accompanying every `VFMA` with a `VMOV`as the constant addend gets overwritten.
I've had no success in generating a VFMAS instruction for a constant scalar, so my Newton-Raphson iterations are VMUL; VMOV; VFMA; VMUL.
Non-constant scalar:
```
float32x4_t func3(float32x4_t a, float32x4_t b, float32_t c)
{
a = vfmasq(a,b,c);
a = vfmasq(a,b,c);
return vfmasq(a,b,c);
}
func3:
vmov r0, s8
vfmas.f32 q0, q1, r0
vfmas.f32 q0, q1, r0
vfmas.f32 q0, q1, r0
bx lr
```
Constant scalar:
```
float32x4_t func1(float32x4_t a, float32x4_t b)
{
a = vfmasq(a,b,1.5f);
a = vfmasq(a,b,1.5f);
return vfmasq(a,b,1.5f);
}
func1:
vmov.f32 q2, #1.500000e+00
vmov q3, q2
vfma.f32 q3, q0, q1
vmov q0, q2
vfma.f32 q0, q3, q1
vfma.f32 q2, q0, q1
vmov q0, q2
bx lr
```
More examples at https://godbolt.org/z/cc5navr54
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJykVm-P2rgT_jTmzQiUOATCC17sdn9IlUortb_u28pxJsRdxwbb4c99-tM4sA0sd7e9i1BCxo-fGc8zY0d4rzYGccnyR5Y_jUQXGuuWLz9LJySeRqWtTsv_NwjPq_XDN1DGB9fJoKwB5WHXqYDghEN9gs5jBdK2W-GwgmDjFMY_gDAVHBo0oAJNEtutVlKUGslwUFqDrQMaKLHnOKjQgABpjQ_CBPBSaOGICTegDAj4jIdgzfir2DaeIjF7dB7B7zrhEJy1gbw4e1StiLH6gFuWPbDkiSXnOwDAEVj2RHf-AIwX6SSHMSSTPBp685HxBQwuxleMr-CDMBQsPK-_fwKWPZ7zE_-tv38aOlo__w82ao8eOg_h0OcFautaT07ZLHmOnuj-CM9sltBSX83fBubFJZ3hVRHigRuOb2yWTAAe4AVPUKm6RodGIiX_0CjZgDLbLoDDjfIBHdlbW6laYQVjYuxDVIHxuY--RFWhqcj7oVH64vvgVEAPpZAvpLc1CLaO-LbTQUWZTeUnw2x80MJsQAoDGzToRLiQoQoNOqidbSMFmyX7uhV-x2YJKBOcMl5JCsGeUYIwtbYiZPw4_RFiBq7fHy_vPwLR4HHr0HtlDfGUXZzC-IM1-tT_A9UvoK85yoyhYoIXYw_mtSQn8PEWdhki4nOd64M4edBWVFjRCiwI2KMM1r2m_qKnClRN51xrPT7DSFdqor4kcI8GQmO7TQOhUR6Uwb0KotQnaFEYD0LG_jMnZTYEd6cBxaWvyLL-QgUleoevjdarDBsMHuweHekb0Fzp95Hx-R6hERUYC76TEj1FcpGTPIs7uwVV1d2e9hba05uODpHL0ppc32V9a315vjTbpdWuovtszfjWx3XbUy77X3wdlkvdGZkxXgxtgkIcGsqB4UcASU3ZE88ff20sIm4s5_rlBbHQxIjOfhfnMHTO_AOKzZ-Gy-zXkg32Orr2rd3T08V68sV5NDJP6oyfYbs4vEvp7pIbivdgf4ewPPZP7e4KdN41_oum6Ts0_Q0R00lev1PHN9C_kvIWeEfN9K6agyTDjseDg2fpJE_oQsYfk1v9ziWwy6Ik_K26V5Q96iLffabkXUw9KrtiegPi_9Zdefz7Alpbh4BH0W41ehABmhC2nlIaz_ONrUqrw8S6DeOrPxhfSZkbsXf5tJ8_qpZZtcgWYoTLdDYv0nQ-KxajZjkvZYq8mEpelNOyWGC9mGaiknmRlwueVyO15AnPkiyZpfm0SBYTXORTmUpRFrPZPCklmybYCqUnWu9bimCkvO9wOUt5Woy0KFH7-HnGucEDxEHGOX2tuSXNGZfdxrNpopUP_hdLUEHj8uHrGugLZHxnTzZ0RLyew9Xbs--ylY46p5c3GVOh6cqJtC3jK_J5foy3zv5EGRhfxUg946u4kj8DAAD__8xBCtE">