[PATCH] D99324: [AArch64][SVE] Codegen dup_lane for dup(vector_extract)

Mon Mar 29 04:43:34 PDT 2021

paulwalker-arm added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td:624
+  def : Pat<(nxv4f16 (AArch64dup (f16 (vector_extract (nxv4f16 ZPR:$vec), sve_elm_idx_extdup_h:$index)))),
+            (DUP_ZZI_H ZPR:$vec, sve_elm_idx_extdup_h:$index)>;
+  def : Pat<(nxv2f16 (AArch64dup (f16 (vector_extract (nxv2f16 ZPR:$vec), sve_elm_idx_extdup_h:$index)))),
----------------
sdesmalen wrote:
> This isn't entirely correct, because a nxv4f16 has gaps between the elements. A full nxv8f16 has vscale x 8 elements, so that means a nxv4f16 has vscale x 4 elements, with 4 gaps in between, e.g. `<elt0, _, elt1, _, .. >`. That means the element must be multiplied by 2 in this case (and the one for nxv2f32), and 4 for the nxv2f16 case.
While logically true I think in practice you'd rewrite the patten so the instruction's element type matched that of the "packed" vector associated with the dag result's element count (i.e. D for nxv2, S for nxv4).

So in this instance something like:
```
  def : Pat<(nxv4f16 (AArch64dup (f16 (vector_extract (nxv4f16 ZPR:$vec), sve_elm_idx_extdup_s:$index)))),
            (DUP_ZZI_S ZPR:$vec, sve_elm_idx_extdup_s:$index)>;
``` 

So in essense all `nxv4` results are considered to be duplicating floats, with all `nxv2` results the result of duplicating doubles.

Is it possible to move the patterns into the multiclass for sve_int_perm_dup_i?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99324/new/

https://reviews.llvm.org/D99324