[PATCH] D155592: [AArch64] Reuse larger DUPLANE if available

Thu Jul 20 02:16:26 PDT 2023

jaykang10 added a comment.

In D155592#4517823 <https://reviews.llvm.org/D155592#4517823>, @dmgreen wrote:

> Thanks. Looks pretty good. Do we need to handle the other "indexing" operations in the same way? For example something like this below, which is for fmul. I would guess that the extending operations (umull) are more likely to see the problem, but you can imagine the others in the "AdvSIMD indexed element" section of AArch64InstrInfo.td might all be better using dup_v8i16 now.
>
>   define <4 x float> @sel.v8i16(ptr %p, ptr %q, <4 x float> %a, <4 x float> %b, <2 x float> %c) {
>     %splat = shufflevector <4 x float> %a, <4 x float> poison, <4 x i32> zeroinitializer
>     %splat2 = shufflevector <4 x float> %a, <4 x float> poison, <2 x i32> zeroinitializer
>     
>     %r = fmul <4 x float> %b, %splat
>     %r2 = fmul <2 x float> %c, %splat2
>     store <2 x float> %r2, ptr %p
>     ret <4 x float> %r
>   }

I have added `dup_v8i16` and `dup_v4i32` into `SIMDVectorIndexedLongSD` and `SIMDIndexedLongSD` multiclass.
The `umull` uses `SIMDVectorIndexedLongSD` like `smull` so it will be matched with the `dup_v8i6` and `dup_v4i32`.
Let me check the `fmul` example and 64-bits `AArch64duplane` with `VectorIndexS:$idx`.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155592/new/

https://reviews.llvm.org/D155592