[PATCH] D20931: [X86] Reduce the width of multiplification when its operands are extended from i8 or i16

Thu Jun 2 17:46:18 PDT 2016

wmi added a comment.

Eli and Michael, thanks for your comments. Will post a new patch after I address the issues mentioned.


================
Comment at: lib/Target/X86/X86ISelLowering.cpp:26450
@@ +26449,3 @@
+/// For pattern1, mul <N x i32> can be shrinked to mul <N x i16>.
+/// For pattern2, mul <N x i32> can be shrinked to mul <N x i16>
+/// plus mulhs/mulhw <N x i16>.
----------------
eli.friedman wrote:
> "shrunk", not "shrinked"... but you might want to use "narrowed" here instead.
Will fix it.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:26452
@@ +26451,3 @@
+/// plus mulhs/mulhw <N x i16>.
+static SDValue reduceVMULWidth(SDNode *N, SelectionDAG &DAG,
+                               const X86Subtarget &Subtarget) {
----------------
mkuper wrote:
> Any chance to split this function up? Or is there no logical way to do that?
I can extract the pattern matching part to a separate func. Do you think it is enough?

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:26468
@@ +26467,3 @@
+  enum {
+    DONTCARE,
+    SIZEI8,
----------------
mkuper wrote:
> I'm not sure DONTCARE is a good name for this - it's not a "top" value. Maybe UNKNOWN?
That is better. Will fix it.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:26475
@@ +26474,3 @@
+  // Find out the smaller size of N0 before it is sign/zero-extended.
+  if (N0.getOpcode() == ISD::ZERO_EXTEND ||
+      N0.getOpcode() == ISD::SIGN_EXTEND) {
----------------
mkuper wrote:
> ISD::ANY_EXTEND would work too, right?
> (Not sure how to test this, but if both ZERO and SIGN work, ANY should definitely work.)
Yes, it should work. I think I can generate the same code for it with ISD::ZERO_EXTEND.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:26477
@@ +26476,3 @@
+      N0.getOpcode() == ISD::SIGN_EXTEND) {
+    IsSigned = (N0.getOpcode() == ISD::SIGN_EXTEND) ? true : false;
+    if (N0.getOperand(0).getValueType().getVectorElementType() == MVT::i8)
----------------
eli.friedman wrote:
> What if one side is signed, and the other is unsigned?
Nice catch. I think then the type of the mul should be signed.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:26481
@@ +26480,3 @@
+    else if (N0.getOperand(0).getValueType().getVectorElementType() == MVT::i16)
+      N0Size = SIZEI16;
+  } else {
----------------
mkuper wrote:
> What if it's MVT::i1?
> You may want an "else return SDValue()" here as well.
You are right. Will fix it.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:26493
@@ +26492,3 @@
+      N1Size = SIZEI16;
+  }
+
----------------
eli.friedman wrote:
> Repeated code; should be refactored.
Will fix it.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:26494
@@ +26493,3 @@
+  }
+
+  // Find out the size of N1 if it is a constant splat.
----------------
mkuper wrote:
> What happens if one of the extends is a sext and the other is a zext? It seems like if N0 is a sext and N1 is a zext we'll get IsSigned == true, and if it's the other way around, we'll get IsSigned == false, which seems oddly asymmetrical. Or is there a canonicalization earlier that guarantees the order of a sext and zext?
Eli asked a similar question. If one side is sext, and the other is zext, then the type of mult should be signed. I will fix it.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:26499
@@ +26498,3 @@
+  bool HasAnyUndefs;
+  auto *BV = dyn_cast<BuildVectorSDNode>(N1);
+  if (BV &&
----------------
mkuper wrote:
> Are you guaranteed that if one of {N0, N1} is a extend, and the other is a BuildVector, N0 is the extend? I can imagine something canonizes it that way - and if that's the case, documenting this here would probably be a good idea.
Yes, InstCombine will canonicalize it. Will add comment for it.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:26507
@@ +26506,3 @@
+      N1Size = SIZEI16;
+  }
+
----------------
eli.friedman wrote:
> This is not what you want.  The constant vector is "zero-extended" if ZEXT(TRUNC(vec))==vec, and "sign-extended" if SEXT(TRUNC(vec)) == vec.  Computing whether the vector is a splat has no relation to either of those properties.
Maybe I can use SplatValue.isNegative() to get the signedness of constant? Then use it together with the signedness of N0 to determine the signedness of multiplification.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:26519
@@ +26518,3 @@
+  // Shrink the operands of mul.
+  SDValue NewN0 = DAG.getNode(ISD::TRUNCATE, DL, ReducedVT, N0);
+  SDValue NewN1 = DAG.getNode(ISD::TRUNCATE, DL, ReducedVT, N1);
----------------
mkuper wrote:
> Why not use N0->getOperand(0), at least when the type is i16?
> Although I guess we're pretty much guaranteed DAGCombine will clean this up, and this may be a bit cleaner as is. So, I'm not really sure which is better.
Yes, I tried to make the code simpler here.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:26523
@@ +26522,3 @@
+  // Handle the case when split is needed.
+  if (VT.getVectorNumElements() > OpsVT.getVectorNumElements()) {
+    // Split the nodes.
----------------
mkuper wrote:
> This is necessary because this runs before legalization, right?
> What if it ran after legalization? Does that help? Or is that too late?
Yes, it is necessary to run before type legalization. That is because suppose original mul is of type <16 x i32>, it will be splitted into four muls of type <4 x i32> in legalization. We actually need after the transformation is two muls of type <8 x i16>, it is more difficult to merge pairs of muls if it is done after legalization. 

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:26544
@@ +26543,3 @@
+                                    OpsVT, SubN0, SubN1);
+        SDValue ResLo = DAG.getNode(X86ISD::UNPCKL, DL, OpsVT, MulLo, MulHi);
+        ResLo = DAG.getNode(ISD::BITCAST, DL, ResVT, ResLo);
----------------
mkuper wrote:
> Why not use a generic shuffle here? Is the lowering further down not good enough to get the right unpacks?
Because I feel using generic shuffle doesn't make the code simpler. We still need two shuffles, two bitcast and one concat_vectors.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:26612
@@ +26611,3 @@
+    }
+  }
+}
----------------
eli.friedman wrote:
> There's massive code duplication here; needs to be refactored.  Maybe make splitting the inputs if necessary and actually computing the product separate steps?
I did see there were some code duplication. Like I can merge the code for "VT.getVectorNumElements() > OpsVT.getVectorNumElements()" and that for "VT.getVectorNumElements() == OpsVT.getVectorNumElements()", but I felt it made the logic little bit less clear after the merge, which mixed the case requiring split with the case requiring no split. 

Probably still worth it. I will do it and add some comments to clarify the logic.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:29777
@@ -29594,1 +29776,3 @@
+  case ISD::MUL:
+    return combineMul(N, DAG, DCI, Subtarget);
   case ISD::SHL:
----------------
mkuper wrote:
> Why the linebreak?
It is done by clang-format. I think it is more consistent without the linebreak. I will fix it.

================
Comment at: test/CodeGen/X86/shrink_vmul.ll:1
@@ +1,2 @@
+; NOTE: Assertions have been autogenerated by update_llc_test_checks.py
+; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse2 | FileCheck %s
----------------
mkuper wrote:
> Could you add sext tests as well? I'm not sure you need the whole type x {zext, sext} matrix, but one sext test would be good.
Will add it.


Repository:
  rL LLVM

http://reviews.llvm.org/D20931