[PATCH] D148234: [AArch64] Remove AND and FMOV between uaddlv an urshl

Thu Apr 13 09:16:52 PDT 2023

jaykang10 added a comment.

In D148234#4265381 <https://reviews.llvm.org/D148234#4265381>, @dmgreen wrote:

> This feels a bit too specific to the exact instructions here, as opposed to the general case. We could change how i64 shifts are represented in the DAG, using v1i64 instead to show that they operate on neon registers. The `and 0xffff` could be removed by teaching it that the uaddlv node only produces zeros in the upper bits (in AArch64TargetLowering::computeKnownBitsForTargetNode). That doesn't solve everything. The representation of aarch64.neon.uaddlv might need to change too, perhaps to produce a v8i16, and something might need to recognize that the upper lanes are zero. That is the part that I'm less sure how it would work.

I agree with you. This pattern targets too specific case...
The fundamental issue is clang generates the function definition of `vaddlv_u8` as below and llvm supports the code sequence.

  define internal fastcc i16 @vaddlv_u8(<8 x i8> noundef %__p0) unnamed_addr #2 {  
  entry:
    %vaddlv = tail call i32 @llvm.aarch64.neon.uaddlv.i32.v8i8(<8 x i8> %__p0)
    %0 = trunc i32 %vaddlv to i16
    ret i16 %0
  }

If clang generates `llvm.aarch64.neon.uaddlv.i16.v8i8` or `llvm.aarch64.neon.uaddlv.f16.v8i8` rather than `llvm.aarch64.neon.uaddlv.i32.v8i8` and llvm supports it, we could not see the `and`.
The `uaddlv` also has similar issue. It has `FPR` as output register class but the intrinsic function uses integer type as output type. In order to support it, llvm has specific tablegen patterns.
If possible, I did not want to change existing patterns and codes with the current intrinsic definition in clang...

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D148234/new/

https://reviews.llvm.org/D148234