<table border="1" cellspacing="0" cellpadding="8">

    <tr>

        <th>Issue</th>

        <td>

            <a href=https://github.com/llvm/llvm-project/issues/60817>60817</a>

        </td>

    </tr>

    <tr>

        <th>Summary</th>

        <td>

            [AArch64] Use scalar operand for lane extract 0

        </td>

    </tr>

    <tr>

      <th>Labels</th>

      <td>

            backend:AArch64,

            missed-optimization

      </td>

    </tr>

    <tr>

      <th>Assignees</th>

      <td>

      </td>

    </tr>

    <tr>

      <th>Reporter</th>

      <td>

          SamTebbs33

      </td>

    </tr>

</table>

<pre>

    For this input:

```

#include <arm_neon.h>

float32_t __attribute__ ((noinline))

test_vmulxs_laneq_f32_lane0 (float32_t vec1_1, float32x4_t vec1_2) {

  return vmulxs_laneq_f32 (vec1_1, vec1_2, 0);

}

```

We generate:

`fmulx   s0, s0, v1.s[0]`

but this could be:

`

fmulx   s0, s0, s1`

which has only scalar operands and could perform more consistent across different cores (i.e. the same or better than the lane extract)

</pre>

<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJxsk8GSozYQhp9GXLqGEhJgc-DgGccvkKRypCTRGCVCctTCmcnTbwnbO7veqVI10Ki_n_5bKCJ79og9a15ZcyzUmuYQ-9_V8gdqTVIWOowf_SlESLMlsP6yJiYPjB8Zf8SW39ftUUjrjVtHBCbfVFwGj8GXM5O_3TZMLqgkxZBgGFRK0eo14TAAE3sm9j5Y76xHJrq8toqElIbrsrp3Gpzy-O8wSbHd8Vz1CbyiqYaKiTe4597rR1Yw0QHbvd6IABHTGj08UzPvk_KofAOeP0beq9nu-HXrW_wL4Yweo0r4q1NT1gMA4pl6i9eqJNa8ctYcn1B6TTffTVjdCPoH4GPjV0Cqnjj_zdbMMCuC4N0HkFFORQgXjMqPBMqPd4ULxinEBZYQEUzwZCmhT6BMDEQw2mnCmBMmRKRsli2xhDQjkFoQQgSNKWE-Lspv-Wwt4HuKyqTvEy3GXo6d7FSBfdXu2qZta94Vc2_aUVV6rNRem6muBedKmQr1qJpOKyEK2wsuJBdVW7WNFF0pqw5riXrSuz22pmU1x0VZVzp3XcoQz4UlWrFv-b7aFU5pdLQddyG0Mv-gH5k8HA7RzG3NRB42E2KxRDi-hEuyi_1fJRt8ftcci9hn7otez8Rq7iwl-lRKNrntX3rwmiP8SfhkOEwh_mQL8GKNrp9TulCesDgxcTrbNK-6NGFh4pQV7peXSwx_Y_bytDVGTJy23r4FAAD__w3uKNw">