[PATCH] D143283: [AArch64][SVE]: custom lower AVGFloor/AVGCeil.

Mon Feb 20 08:46:01 PST 2023

sdesmalen added inline comments.

================
Comment at: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:1034
+
+  if ((ExtOpA.getOpcode() == ISD::ZERO_EXTEND ||
+       ExtOpA.getOpcode() == ISD::SIGN_EXTEND) &&
----------------
sdesmalen wrote:
> hassnaa-arm wrote:
> > sdesmalen wrote:
> > > This is probably better added as a separate DAGCombine that folds away the sign-extend-in-regs explicitly, because that is already performed by the avgfloor operation.
> > > 
> > >   avgfloors(sextinreg(x), sextinreg(y)) -> avgfloors(x, y)
> > Sorry, I don't understand this comment, may you clarify if more ?
> > Why did you mention sign-extend-in-regs ? why is it related to this code ?
> The truncate + sign-extend will be combined into a sign-extend-in-reg. You can see that if you disable the code you added below on lines 1046-1055, and run the test `@hadd8_sext_asr`
Given my other suggestion, please ignore my suggestion above to add a DAGCombine (which I'm not even sure is legal; the semantics of the AVGFLOOR operation aren't entirely clear to me still)

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:13395
+
+  SDValue srlA = DAG.getNode(ISD::SRL, dl, VT, OpA, ConstantOne);
+  SDValue srlB = DAG.getNode(ISD::SRL, dl, VT, OpB, ConstantOne);
----------------
Should this be using an arithmetic shift when the operation is signed?

================
Comment at: llvm/test/CodeGen/AArch64/sve-avg_floor_ceil.ll:62
 ; CHECK:       // %bb.0: // %entry
-; CHECK-NEXT:    and z0.d, z0.d, #0xffffffff
-; CHECK-NEXT:    adr z0.d, [z0.d, z1.d, uxtw]
-; CHECK-NEXT:    lsr z0.d, z0.d, #1
+; CHECK-NEXT:    lsr z2.d, z1.d, #1
+; CHECK-NEXT:    lsr z3.d, z0.d, #1
----------------
hassnaa-arm wrote:
> sdesmalen wrote:
> > This doesn't look like an improvement, we don't really want to do this transform if it makes the resulting code worse. Do you know why this results in worse code?
> In the new changes, the generated instructions are exactly the equivalent for AVGFloor, no additional instructions.
> I think nothing can be done for this case.
For these cases (where the input is an unpacked vector that is promoted using an `and` (for the unsigned case) or `sign_extend_inreg` (for the signed case)) we can emit the original code when Custom lowering these nodes. That way we can mitigate the regression.

================
Comment at: llvm/test/CodeGen/AArch64/sve-avg_floor_ceil.ll:169
   %m = add nsw <vscale x 4 x i32> %s0s, %s1s
   %s = lshr <vscale x 4 x i32> %m, shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
   %s2 = trunc <vscale x 4 x i32> %s to <vscale x 4 x i16>
----------------
Should this use `ashr` (arithmetic shift, since the values s0s/s1s are signed)?

Same question for other tests in this file.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D143283/new/

https://reviews.llvm.org/D143283