<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/60817>60817</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [AArch64] Use scalar operand for lane extract 0
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            backend:AArch64,
            missed-optimization
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          SamTebbs33
      </td>
    </tr>
</table>

<pre>
    For this input:

```
#include <arm_neon.h>
float32_t __attribute__ ((noinline))
test_vmulxs_laneq_f32_lane0 (float32_t vec1_1, float32x4_t vec1_2) {
  return vmulxs_laneq_f32 (vec1_1, vec1_2, 0);
}
```

We generate:

`fmulx   s0, s0, v1.s[0]`

but this could be:
`
fmulx   s0, s0, s1`

which has only scalar operands and could perform more consistent across different cores (i.e. the same or better than the lane extract)

</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJxsk8GSozYQhp9GXLqGEhJgc-DgGccvkKRypCTRGCVCctTCmcnTbwnbO7veqVI10Ki_n_5bKCJ79og9a15ZcyzUmuYQ-9_V8gdqTVIWOowf_SlESLMlsP6yJiYPjB8Zf8SW39ftUUjrjVtHBCbfVFwGj8GXM5O_3TZMLqgkxZBgGFRK0eo14TAAE3sm9j5Y76xHJrq8toqElIbrsrp3Gpzy-O8wSbHd8Vz1CbyiqYaKiTe4597rR1Yw0QHbvd6IABHTGj08UzPvk_KofAOeP0beq9nu-HXrW_wL4Yweo0r4q1NT1gMA4pl6i9eqJNa8ctYcn1B6TTffTVjdCPoH4GPjV0Cqnjj_zdbMMCuC4N0HkFFORQgXjMqPBMqPd4ULxinEBZYQEUzwZCmhT6BMDEQw2mnCmBMmRKRsli2xhDQjkFoQQgSNKWE-Lspv-Wwt4HuKyqTvEy3GXo6d7FSBfdXu2qZta94Vc2_aUVV6rNRem6muBedKmQr1qJpOKyEK2wsuJBdVW7WNFF0pqw5riXrSuz22pmU1x0VZVzp3XcoQz4UlWrFv-b7aFU5pdLQddyG0Mv-gH5k8HA7RzG3NRB42E2KxRDi-hEuyi_1fJRt8ftcci9hn7otez8Rq7iwl-lRKNrntX3rwmiP8SfhkOEwh_mQL8GKNrp9TulCesDgxcTrbNK-6NGFh4pQV7peXSwx_Y_bytDVGTJy23r4FAAD__w3uKNw">