[PATCH] [X86] Integer multiplication improvements

Sun Feb 1 07:29:27 PST 2015

Thanks, Chandler!


================
Comment at: lib/Target/X86/X86ISelLowering.cpp:24019-24023
@@ +24018,7 @@
+
+  // How many steps will we have to peform to replace the
+  // MUL. We limit this to 3 steps, based on imul latency of 4
+  // (If the latency is equal, this is probably not a win)
+  unsigned Steps = 0;
+  const unsigned MAX_STEPS = 3;
+  // Having to negate or shift is a step
----------------
chandlerc wrote:
> This doesn't really make sense. We shouldn't be hard coding these things here.
> 
> - LEA as slower on Atom than on other X86 chips.
> - IMUL can be anything from 6 to 14 cycles on some X86 chips.
> 
> We have a schedule, we should consult it for the expected latency of these and use those to drive the limits. (And we should fix the schedule to be correct if it is currently wrong.)
> 
> Also, I wouldn't use capitals here. This isn't a macro.
Right, didn't think about Atom, I'll check whether it's worthwhile there, thanks!

The "4" is just a conservative estimate - precisely because it can be anything in a fairly large range (can be below 6 - starts at 4 on HSW). I don't think we model that anywhere, so taking a number from the schedule didn't look like the right thing to do. In any case, I'll take a look at what the schedule currently is, and whether it makes sense.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:24025-24026
@@ +24024,4 @@
+  // Having to negate or shift is a step
+  Steps += (IsNeg ?  1 : 0);
+  Steps += (ShiftAmt ?  1 : 0);
+  
----------------
chandlerc wrote:
> Just use the boolean?
I always thought adding a boolean looks odd, but sure.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:24060-24061
@@ -24032,1 +24059,4 @@
+  // Do not add new nodes to DAG combiner worklist.
+  if (MulAmt == 1)
     DCI.CombineTo(N, NewMul, false);
+
----------------
chandlerc wrote:
> It's pretty hostile to the DAG to create nodes unless they are absolutely going to be used. It's almost certainly better to do the math in a tight loop first to verify that we'll actually combine this node.
> 
> That will also let you use early exit.
Got it.

http://reviews.llvm.org/D7320

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/