[PATCH] D99324: [AArch64][SVE] Codegen dup_lane for dup(vector_extract)

Sander de Smalen via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Mar 29 04:06:48 PDT 2021


sdesmalen added inline comments.


================
Comment at: llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td:624
+  def : Pat<(nxv4f16 (AArch64dup (f16 (vector_extract (nxv4f16 ZPR:$vec), sve_elm_idx_extdup_h:$index)))),
+            (DUP_ZZI_H ZPR:$vec, sve_elm_idx_extdup_h:$index)>;
+  def : Pat<(nxv2f16 (AArch64dup (f16 (vector_extract (nxv2f16 ZPR:$vec), sve_elm_idx_extdup_h:$index)))),
----------------
This isn't entirely correct, because a nxv4f16 has gaps between the elements. A full nxv8f16 has vscale x 8 elements, so that means a nxv4f16 has vscale x 4 elements, with 4 gaps in between, e.g. `<elt0, _, elt1, _, .. >`. That means the element must be multiplied by 2 in this case (and the one for nxv2f32), and 4 for the nxv2f16 case.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99324/new/

https://reviews.llvm.org/D99324



More information about the llvm-commits mailing list