[PATCH] D20931: [X86] Reduce the width of multiplification when its operands are extended from i8 or i16

Thu Jun 2 17:59:29 PDT 2016

mkuper added inline comments.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:26452
@@ +26451,3 @@
+/// plus mulhs/mulhw <N x i16>.
+static SDValue reduceVMULWidth(SDNode *N, SelectionDAG &DAG,
+                               const X86Subtarget &Subtarget) {
----------------
wmi wrote:
> mkuper wrote:
> > Any chance to split this function up? Or is there no logical way to do that?
> I can extract the pattern matching part to a separate func. Do you think it is enough?
Yes, together with the refactoring Eli suggested, that should be good.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:26507
@@ +26506,3 @@
+      N1Size = SIZEI16;
+  }
+
----------------
wmi wrote:
> eli.friedman wrote:
> > This is not what you want.  The constant vector is "zero-extended" if ZEXT(TRUNC(vec))==vec, and "sign-extended" if SEXT(TRUNC(vec)) == vec.  Computing whether the vector is a splat has no relation to either of those properties.
> Maybe I can use SplatValue.isNegative() to get the signedness of constant? Then use it together with the signedness of N0 to determine the signedness of multiplification.
I think what Eli meant is that there's no reason to look for a splat, specifically. If every element is small enough, it doesn't matter whether the vector is a splat or not.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:26519
@@ +26518,3 @@
+  // Shrink the operands of mul.
+  SDValue NewN0 = DAG.getNode(ISD::TRUNCATE, DL, ReducedVT, N0);
+  SDValue NewN1 = DAG.getNode(ISD::TRUNCATE, DL, ReducedVT, N1);
----------------
wmi wrote:
> mkuper wrote:
> > Why not use N0->getOperand(0), at least when the type is i16?
> > Although I guess we're pretty much guaranteed DAGCombine will clean this up, and this may be a bit cleaner as is. So, I'm not really sure which is better.
> Yes, I tried to make the code simpler here.
As long as this gets clean up, that sounds reasonable.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:26523
@@ +26522,3 @@
+  // Handle the case when split is needed.
+  if (VT.getVectorNumElements() > OpsVT.getVectorNumElements()) {
+    // Split the nodes.
----------------
wmi wrote:
> mkuper wrote:
> > This is necessary because this runs before legalization, right?
> > What if it ran after legalization? Does that help? Or is that too late?
> Yes, it is necessary to run before type legalization. That is because suppose original mul is of type <16 x i32>, it will be splitted into four muls of type <4 x i32> in legalization. We actually need after the transformation is two muls of type <8 x i16>, it is more difficult to merge pairs of muls if it is done after legalization. 
Ah, ok, got it, thanks.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:26544
@@ +26543,3 @@
+                                    OpsVT, SubN0, SubN1);
+        SDValue ResLo = DAG.getNode(X86ISD::UNPCKL, DL, OpsVT, MulLo, MulHi);
+        ResLo = DAG.getNode(ISD::BITCAST, DL, ResVT, ResLo);
----------------
wmi wrote:
> mkuper wrote:
> > Why not use a generic shuffle here? Is the lowering further down not good enough to get the right unpacks?
> Because I feel using generic shuffle doesn't make the code simpler. We still need two shuffles, two bitcast and one concat_vectors.
It will actually make the code a bit more complex, really. What I had in mind was that if the result of the mul itself feeds a shuffle, leaving generic shuffles here may expose more dag combines further down the road.

Repository:
  rL LLVM

http://reviews.llvm.org/D20931