[PATCH] D143283: [AArch64][SVE]: custom lower AVGFloor/AVGCeil.

Wed Mar 1 07:53:27 PST 2023

dmgreen added a comment.

Hi - I was just looking at the patch whilst you updated it! Please ignore any comments that don't apply any more.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:13396

+static bool IsZeroExtended(SDValue Node, SelectionDAG &DAG) {
+  unsigned SzInBits = Node->getValueType(0).getScalarSizeInBits();
----------------
It might be better to make these lambdas inside LowerAVG. The names sound fairly generic but they are in practice tied to hadd nodes.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:13397
+static bool IsZeroExtended(SDValue Node, SelectionDAG &DAG) {
+  unsigned SzInBits = Node->getValueType(0).getScalarSizeInBits();
+  KnownBits Known = DAG.computeKnownBits(Node, 0);
----------------
Node.getValueType()...

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:13399
+  KnownBits Known = DAG.computeKnownBits(Node, 0);
+  APInt RequiredZero(SzInBits, 0);
+  RequiredZero.setBitsFrom(SzInBits/2);
----------------
I believe this only needs 1 bit to be zero, so it could probably use Known.Zero.isSignBitSet()?
There is a proof in https://alive2.llvm.org/ce/z/cEtdJa for rhadd.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:13406
+  unsigned SzInBits = Node->getValueType(0).getScalarSizeInBits();
+  KnownBits Known = DAG.computeKnownBits(Node, 0);
+  APInt RequiredOne(SzInBits, 0);
----------------
I think this one would be better to use computeNumSignBits. It's less important whether they are 0 or 1, just that they are the same as the top bit. Again I think it needs 1 bit to be valid, so computeNumSignBits(..) > 1 should do it.
https://alive2.llvm.org/ce/z/VbxRs-.

It would be good to see proofs for the other bit here too where the hadd is expanded to `SRL(A) + SRL(B) + (A&B)&1`

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:13424-13430
+  bool IsCeil = false;
+  if ((Op->getOpcode() == ISD::AVGCEILS || Op->getOpcode() == ISD::AVGCEILU))
+    IsCeil = true;
+
+  bool IsSigned = false;
+  if ((Op->getOpcode() == ISD::AVGFLOORS || Op->getOpcode() == ISD::AVGCEILS))
+    IsSigned = true;
----------------
I would probably do:
```
unsigned Opc = Op->getOpcode();
bool IsCeil = Opc == ISD::AVGCEILS || Opc == ISD::AVGCEILU;
bool IsSigned = Opc == ISD::AVGFLOORS || Op->getOpcode() == ISD::AVGCEILS;
```

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:13437
+  // Check if it's better to emit the original code:
+  if (IsZeroExtended(OpA, DAG) || IsZeroExtended(OpB, DAG)) {
+    // In this case omiting the original code is better than custom lowering
----------------
&&, not ||, I would expect.

A HADD is equivalent to `trunc(shr(add(ext(x),ext(y)), 1))`, it is not directly equivalent to `shr(add(x,y), 1)`. So we need to prove that turning it into the simpler version add+shift is better. It's not really "emit the original code".

================
Comment at: llvm/test/CodeGen/AArch64/sve-hadd.ll:75
   %m = add nsw <vscale x 2 x i64> %s0s, %s1s
-  %s = lshr <vscale x 2 x i64> %m, shufflevector (<vscale x 2 x i64> insertelement (<vscale x 2 x i64> poison, i64 1, i32 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer)
+  %s = ashr <vscale x 2 x i64> %m, shufflevector (<vscale x 2 x i64> insertelement (<vscale x 2 x i64> poison, i64 1, i32 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer)
   %s2 = trunc <vscale x 2 x i64> %s to <vscale x 2 x i32>
----------------
This ideally shouldn't change. Because the top bit isn't demanded instcombine will transform ashr into lshr, and we should be testing what the backend will see: https://godbolt.org/z/bvof8j7cc. I guess the lshr version isn't transformed after it gets promoted? It might be OK in this case.

================
Comment at: llvm/test/CodeGen/AArch64/sve-hadd.ll:104

 define <vscale x 4 x i32> @hadds_v4i32(<vscale x 4 x i32> %s0, <vscale x 4 x i32> %s1) {
+; SVE-LABEL: hadds_v4i32:
----------------
hassnaa-arm wrote:
> @dmgreen I tried to get alive proofs for this transform,
> https://alive2.llvm.org/ce/
> but it seems there is something wrong.
> I'm not sure if the problem related to my equivalent IR to the generated code or the problem is because the transform is not correct.
Can you update the link? There are some other alive links that could help.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D143283/new/

https://reviews.llvm.org/D143283