[PATCH] D20931: [X86] Reduce the width of multiplification when its operands are extended from i8 or i16

Tue Jun 7 10:44:40 PDT 2016

wmi added inline comments.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:26537
@@ +26536,3 @@
+    }
+  }
+
----------------
eli.friedman wrote:
> It feels like you should be able to use ComputeNumSignBits/computeKnownBits here; I'm not sure how much shorter that actually ends up, though.
Thanks for the suggestion. I change the value range check of intconst to use ComputeNumSignBits.  The code length doesn't change much. But ComputeNumSignBits is more powerful, I can use it to do some extension further in the future, like,

; %val1 = load <2 x i8>
; %op1 = zext<2 x i32> %val1
; %val2 = load <2 x i8>
; %op2 = zext<2 x i32> %val2
; %add = add <2 x i32> %op1, %op2
; %rst = mul <2 x i32> %add, %op2

ComputeNumSignBits may know %add's value range is within 0 ~ 32767 (Actually 0 ~ 255*2).

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:26675
@@ +26674,3 @@
+      return DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, VT, Res,
+                         DAG.getIntPtrConstant(0, DL));
+    } else {
----------------
eli.friedman wrote:
> It's not obvious to me why you're explicitly legalizing this here; you could just generate a MUL on, for example, <4 x i16> and legalization should do the right thing from there.
I choose to explicitly legalize here because implicit legalization will generate different results:

Suppose the input is <4 x i16>, for implicit legalization, it will be converted to <4 x i64> then bitcast to <8 x i16> before being used as the input of pmullw. If the input is a vector load + sext/zext, then the input needs to be unpck twice to get <4 x i64>.

For explicit legalization, I choose to concat <4 x i16> with vector undef to get <8 x i16>. If the input is a vector load + sext/zext, then the input can be directly used as the input of pmullw.

================
Comment at: test/CodeGen/X86/shrink_vmul.ll:751
@@ +750,3 @@
+  %ins1 = insertelement <2 x i32> %ins0, i32 32767, i32 1
+  %tmp13 = mul nuw nsw <2 x i32> %tmp8, %ins1
+  %tmp14 = getelementptr inbounds i32, i32* %pre, i64 %index
----------------
eli.friedman wrote:
> It would probably be more clear to write this as `mul nuw nsw <2 x i32> %tmp8, <i32 -32768, i32 32767>`.
That is better. Fixed.

Repository:
  rL LLVM

http://reviews.llvm.org/D20931