[PATCH] D20931: [X86] Reduce the width of multiplification when its operands are extended from i8 or i16

Thu Jun 2 15:13:17 PDT 2016

mkuper added inline comments.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:26452
@@ +26451,3 @@
+/// plus mulhs/mulhw <N x i16>.
+static SDValue reduceVMULWidth(SDNode *N, SelectionDAG &DAG,
+                               const X86Subtarget &Subtarget) {
----------------
Any chance to split this function up? Or is there no logical way to do that?

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:26468
@@ +26467,3 @@
+  enum {
+    DONTCARE,
+    SIZEI8,
----------------
I'm not sure DONTCARE is a good name for this - it's not a "top" value. Maybe UNKNOWN?

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:26475
@@ +26474,3 @@
+  // Find out the smaller size of N0 before it is sign/zero-extended.
+  if (N0.getOpcode() == ISD::ZERO_EXTEND ||
+      N0.getOpcode() == ISD::SIGN_EXTEND) {
----------------
ISD::ANY_EXTEND would work too, right?
(Not sure how to test this, but if both ZERO and SIGN work, ANY should definitely work.)

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:26481
@@ +26480,3 @@
+    else if (N0.getOperand(0).getValueType().getVectorElementType() == MVT::i16)
+      N0Size = SIZEI16;
+  } else {
----------------
What if it's MVT::i1?
You may want an "else return SDValue()" here as well.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:26486
@@ +26485,3 @@
+
+  // Find out the smaller size of N1 before it is sign/zero-extended.
+  if (N1.getOpcode() == ISD::ZERO_EXTEND ||
----------------
Same two comments as above.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:26494
@@ +26493,3 @@
+  }
+
+  // Find out the size of N1 if it is a constant splat.
----------------
What happens if one of the extends is a sext and the other is a zext? It seems like if N0 is a sext and N1 is a zext we'll get IsSigned == true, and if it's the other way around, we'll get IsSigned == false, which seems oddly asymmetrical. Or is there a canonicalization earlier that guarantees the order of a sext and zext?

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:26499
@@ +26498,3 @@
+  bool HasAnyUndefs;
+  auto *BV = dyn_cast<BuildVectorSDNode>(N1);
+  if (BV &&
----------------
Are you guaranteed that if one of {N0, N1} is a extend, and the other is a BuildVector, N0 is the extend? I can imagine something canonizes it that way - and if that's the case, documenting this here would probably be a good idea.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:26519
@@ +26518,3 @@
+  // Shrink the operands of mul.
+  SDValue NewN0 = DAG.getNode(ISD::TRUNCATE, DL, ReducedVT, N0);
+  SDValue NewN1 = DAG.getNode(ISD::TRUNCATE, DL, ReducedVT, N1);
----------------
Why not use N0->getOperand(0), at least when the type is i16?
Although I guess we're pretty much guaranteed DAGCombine will clean this up, and this may be a bit cleaner as is. So, I'm not really sure which is better.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:26523
@@ +26522,3 @@
+  // Handle the case when split is needed.
+  if (VT.getVectorNumElements() > OpsVT.getVectorNumElements()) {
+    // Split the nodes.
----------------
This is necessary because this runs before legalization, right?
What if it ran after legalization? Does that help? Or is that too late?

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:26544
@@ +26543,3 @@
+                                    OpsVT, SubN0, SubN1);
+        SDValue ResLo = DAG.getNode(X86ISD::UNPCKL, DL, OpsVT, MulLo, MulHi);
+        ResLo = DAG.getNode(ISD::BITCAST, DL, ResVT, ResLo);
----------------
Why not use a generic shuffle here? Is the lowering further down not good enough to get the right unpacks?

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:29777
@@ -29594,1 +29776,3 @@
+  case ISD::MUL:
+    return combineMul(N, DAG, DCI, Subtarget);
   case ISD::SHL:
----------------
Why the linebreak?

================
Comment at: test/CodeGen/X86/shrink_vmul.ll:1
@@ +1,2 @@
+; NOTE: Assertions have been autogenerated by update_llc_test_checks.py
+; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse2 | FileCheck %s
----------------
Could you add sext tests as well? I'm not sure you need the whole type x {zext, sext} matrix, but one sext test would be good.


Repository:
  rL LLVM

http://reviews.llvm.org/D20931