[PATCH] D158613: [AArch64] Mark known zero for high 16-bits of uaddlv intrinsic output with v8i8

Wed Aug 23 06:35:26 PDT 2023

jaykang10 created this revision.
jaykang10 added reviewers: dmgreen, efriedma, t.p.northover.
Herald added subscribers: hiraditya, kristof.beyls.
Herald added a project: All.
jaykang10 requested review of this revision.
Herald added a project: LLVM.
Herald added a subscriber: llvm-commits.

If `llvm.aarch64.neon.uaddlv` intrinsic has `v8i8` type input, the it returns 16-bits value.
clang generates `llvm.aarch64.neon.uaddlv.i32.v8i8` and `trunc to i16` for `vaddlv_u8` neon intrinsic. It causes additional `and 0xffff` instruction from attached example as below.

  foo:                                    // @foo
          uaddlv  h0, v0.8b
          fmov    w8, s0
          and     w0, w8, #0xffff
          ret

If we mark know zero for high 16-bits of uaddlv intrinsic output with v8i8, we can avoid the additional `and 0xfff`.

  foo:                                    // @foo
          uaddlv  h0, v0.8b
          fmov    w0, s0
          ret


https://reviews.llvm.org/D158613

Files:
  llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
  llvm/test/CodeGen/AArch64/neon-addlv.ll


Index: llvm/test/CodeGen/AArch64/neon-addlv.ll
===================================================================

--- llvm/test/CodeGen/AArch64/neon-addlv.ll
+++ llvm/test/CodeGen/AArch64/neon-addlv.ll
@@ -150,3 +150,16 @@
   %tmp5 = call i32 @llvm.vector.reduce.add.v2i32(<2 x i32> %tmp3)
   ret i32 %tmp5
 }
+
+declare i32 @llvm.aarch64.neon.uaddlv.i32.v8i8(<8 x i8>) nounwind readnone
+
+define i32 @uaddlv_known_bits(<8 x i8> %a) {
+; CHECK-LABEL: uaddlv_known_bits:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    uaddlv h0, v0.8b
+; CHECK-NEXT:    fmov w0, s0
+; CHECK-NEXT:    ret
+  %tmp1 = tail call i32 @llvm.aarch64.neon.uaddlv.i32.v8i8(<8 x i8> %a)
+  %tmp2 = and i32 %tmp1, 65535
+  ret i32 %tmp2
+}
Index: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
===================================================================
--- llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -2162,6 +2162,16 @@
     switch (IntNo) {
     default:
       break;
+    case Intrinsic::aarch64_neon_uaddlv: {
+      MVT VT = Op.getOperand(1).getValueType().getSimpleVT();
+      unsigned BitWidth = Known.getBitWidth();
+      if (VT == MVT::v8i8) {
+        assert(BitWidth >= 16 && "Unexpected width!");
+        APInt Mask = APInt::getHighBitsSet(BitWidth, BitWidth - 16);
+        Known.Zero |= Mask;
+      }
+      break;
+    }
     case Intrinsic::aarch64_neon_umaxv:
     case Intrinsic::aarch64_neon_uminv: {
       // Figure out the datatype of the vector operand. The UMINV instruction


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D158613.552691.patch
Type: text/x-patch
Size: 1543 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20230823/063cd908/attachment.bin>