[PATCH] D91255: [AArch64] Rearrange (dup(sext/zext)) to (sext/zext(dup))

Dave Green via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Nov 11 14:02:12 PST 2020


dmgreen added a comment.

Thanks

So what this seems to be proposing is that ever instance of dup(sext(..)) is transformed into sext(dup(..)). That's a pretty general transform. Off the top of my head... the extend could be free in places (zext i32->i64) but it may allow more folding into other instructions like the smull/saddl/ssubl it can allow here. If the operand is a load, that should at least have been folded into a single instruction already prior to the dup being made.  But I would probably expect a sxth+dup to be quicker than a dup+sshll in general.

I would suggest trying this on a number of small tests and seeing how it does in terms of instruction count. Alternatively try compiling the llvm testsuite and seeing how many things change, and if they look better or worse in the process. But like I said this may need to be a bit more targeted.



================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:14973
+                                     bool Signed) {
+  auto Operand = N->getOperand(0);
+  auto ExtOperand = Operand.getOperand(0);
----------------
The llvm style guide suggests not to over-use auto like this. It just makes it more difficult telling what things are.


================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:14979
+  // Cannot operate on non-vector dups
+  if (!NType.isVector())
+    return SDValue();
----------------
These is no such thing as a "non-vector dup", as far as I understand.


================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:14994
+
+  EVT HalfType = MVT::getIntegerVT(Size >> 1);
+  EVT HalfVT = NType.getVectorVT(*DAG.getContext(), HalfType,
----------------
I'm not sure what this is doing. It should be sign extending from the smaller type with the correct number of vector lanes (either ExtOperand.getValueType() for a sext/zext or Operand.getOperand(1) for a SIGN_EXTEND_INREG, and something based on the mask for an AND). It them probably has to make sure that the new extend is a legal operation.


================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:14998
+
+  auto DupNode = DAG.getNode(N->getOpcode(), SDLoc(N), DAG.getVTList(HalfVT),
+                             {ExtOperand});
----------------
Create a `SDLoc DL(N)` and use it in both getNodes.
This doesn't need getVTList for single types.
Or {} for the operands I don't think.


================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:15012
+  auto Opcode = Operand.getOpcode();
+  bool IsSext = Opcode == ISD::SIGN_EXTEND ||
+                Opcode == ISD::SIGN_EXTEND_INREG || Opcode == ISD::AssertSext;
----------------
Detecting the extend is probably best done inside the function. It's common to do:
  if (SDValue Ext = performDUPExtCombine(..))
    return Ext;
I would personally leave out at least AssertSext and AssertZext until you at least have a test that shows them being needed. If this does SIGN_EXTEND_INREG it should probably handle the equivalent AND as well.


================
Comment at: llvm/test/CodeGen/AArch64/aarch64-matrix-smull.ll:4
+
+define void @matrix_mul_signed(i32 %N, i32* nocapture %C, i16* nocapture readonly %A, i16 %val) {
+; CHECK-LABEL: matrix_mul_signed:
----------------
I would expect tests that looked something like (but I got and edited this one from mve):
```
define <4 x i32> @vdup_i16(i16 %src) {
; CHECK-LABEL: vdup_i16:
; CHECK:       @ %bb.0: @ %entry
; CHECK-NEXT:    vdup.16 q0, r0
; CHECK-NEXT:    bx lr
entry:
  %0 = insertelement <4 x i16> undef, i16 %src, i32 0
  %x = shufflevector <4 x i16> %0, <4 x i16> undef, <4 x i32> zeroinitializer
  %out = sext <4 x i16> %0 to <4 x i32>
  ret <4 x i32> %out
}
```

But for all type and sizes that this transform supports. Which seems to be a lot at the moment. Don't forget scalable types too.

Having tests that show that mul(sext(...), dup(sext(...))) are also folded sounds useful too, but they can hopefully be equally small.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91255/new/

https://reviews.llvm.org/D91255



More information about the llvm-commits mailing list