[PATCH] D96700: [llvm][Aarch64][SVE] Remove extra fmov instruction with certain literals

Mon Feb 15 04:46:30 PST 2021

paulwalker-arm accepted this revision.
paulwalker-arm added a comment.
This revision is now accepted and ready to land.

Patch looks good to me.  I'll leave it up to you whether you want to extend the patch to cover f16/f64 cases or defer until needed.

================
Comment at: llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td:559

+  // Duplicate immediate FP into all vector elements.
+  def : Pat<(nxv2f32 (AArch64dup (f32 fpimm:$val))),
----------------
georges wrote:
> do we also need patterns for f16/f64?
I would say want rather than need given this is more optimisation than function.

================
Comment at: llvm/test/CodeGen/AArch64/sve-intrinsics-dup-x.ll:133-139
+define <vscale x 2 x float> @dup_fmov_imm_f32_2() {
+; CHECK-LABEL: dup_fmov_imm_f32_2:
+; CHECK: mov w8, #1109917696
+; CHECK-NEXT: mov z0.s, w8
+  %out = tail call <vscale x 2 x float> @llvm.aarch64.sve.dup.x.nxv2f32(float 4.200000e+01)
+  ret <vscale x 2 x float> %out
+}
----------------
Keep it if you want but this test "vscale x 2 x float" is kind of unnecessary because the intrinsics only expect to operate on/with fully packed vectors.  The fact some work with unpacked types is really a quirk due to some reusing the ISD nodes used for stock LLVM IR.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96700/new/

https://reviews.llvm.org/D96700