<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/60817>60817</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[AArch64] Use scalar operand for lane extract 0
</td>
</tr>
<tr>
<th>Labels</th>
<td>
backend:AArch64,
missed-optimization
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
SamTebbs33
</td>
</tr>
</table>
<pre>
For this input:
```
#include <arm_neon.h>
float32_t __attribute__ ((noinline))
test_vmulxs_laneq_f32_lane0 (float32_t vec1_1, float32x4_t vec1_2) {
return vmulxs_laneq_f32 (vec1_1, vec1_2, 0);
}
```
We generate:
`fmulx s0, s0, v1.s[0]`
but this could be:
`
fmulx s0, s0, s1`
which has only scalar operands and could perform more consistent across different cores (i.e. the same or better than the lane extract)
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJxsk8GSozYQhp9GXLqGEhJgc-DgGccvkKRypCTRGCVCctTCmcnTbwnbO7veqVI10Ki_n_5bKCJ79og9a15ZcyzUmuYQ-9_V8gdqTVIWOowf_SlESLMlsP6yJiYPjB8Zf8SW39ftUUjrjVtHBCbfVFwGj8GXM5O_3TZMLqgkxZBgGFRK0eo14TAAE3sm9j5Y76xHJrq8toqElIbrsrp3Gpzy-O8wSbHd8Vz1CbyiqYaKiTe4597rR1Yw0QHbvd6IABHTGj08UzPvk_KofAOeP0beq9nu-HXrW_wL4Yweo0r4q1NT1gMA4pl6i9eqJNa8ctYcn1B6TTffTVjdCPoH4GPjV0Cqnjj_zdbMMCuC4N0HkFFORQgXjMqPBMqPd4ULxinEBZYQEUzwZCmhT6BMDEQw2mnCmBMmRKRsli2xhDQjkFoQQgSNKWE-Lspv-Wwt4HuKyqTvEy3GXo6d7FSBfdXu2qZta94Vc2_aUVV6rNRem6muBedKmQr1qJpOKyEK2wsuJBdVW7WNFF0pqw5riXrSuz22pmU1x0VZVzp3XcoQz4UlWrFv-b7aFU5pdLQddyG0Mv-gH5k8HA7RzG3NRB42E2KxRDi-hEuyi_1fJRt8ftcci9hn7otez8Rq7iwl-lRKNrntX3rwmiP8SfhkOEwh_mQL8GKNrp9TulCesDgxcTrbNK-6NGFh4pQV7peXSwx_Y_bytDVGTJy23r4FAAD__w3uKNw">