[PATCH] D20931: [X86] Reduce the width of multiplification when its operands are extended from i8 or i16
Michael Kuperstein via llvm-commits
llvm-commits at lists.llvm.org
Thu Jun 2 17:59:29 PDT 2016
mkuper added inline comments.
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:26452
@@ +26451,3 @@
+/// plus mulhs/mulhw <N x i16>.
+static SDValue reduceVMULWidth(SDNode *N, SelectionDAG &DAG,
+ const X86Subtarget &Subtarget) {
----------------
wmi wrote:
> mkuper wrote:
> > Any chance to split this function up? Or is there no logical way to do that?
> I can extract the pattern matching part to a separate func. Do you think it is enough?
Yes, together with the refactoring Eli suggested, that should be good.
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:26507
@@ +26506,3 @@
+ N1Size = SIZEI16;
+ }
+
----------------
wmi wrote:
> eli.friedman wrote:
> > This is not what you want. The constant vector is "zero-extended" if ZEXT(TRUNC(vec))==vec, and "sign-extended" if SEXT(TRUNC(vec)) == vec. Computing whether the vector is a splat has no relation to either of those properties.
> Maybe I can use SplatValue.isNegative() to get the signedness of constant? Then use it together with the signedness of N0 to determine the signedness of multiplification.
I think what Eli meant is that there's no reason to look for a splat, specifically. If every element is small enough, it doesn't matter whether the vector is a splat or not.
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:26519
@@ +26518,3 @@
+ // Shrink the operands of mul.
+ SDValue NewN0 = DAG.getNode(ISD::TRUNCATE, DL, ReducedVT, N0);
+ SDValue NewN1 = DAG.getNode(ISD::TRUNCATE, DL, ReducedVT, N1);
----------------
wmi wrote:
> mkuper wrote:
> > Why not use N0->getOperand(0), at least when the type is i16?
> > Although I guess we're pretty much guaranteed DAGCombine will clean this up, and this may be a bit cleaner as is. So, I'm not really sure which is better.
> Yes, I tried to make the code simpler here.
As long as this gets clean up, that sounds reasonable.
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:26523
@@ +26522,3 @@
+ // Handle the case when split is needed.
+ if (VT.getVectorNumElements() > OpsVT.getVectorNumElements()) {
+ // Split the nodes.
----------------
wmi wrote:
> mkuper wrote:
> > This is necessary because this runs before legalization, right?
> > What if it ran after legalization? Does that help? Or is that too late?
> Yes, it is necessary to run before type legalization. That is because suppose original mul is of type <16 x i32>, it will be splitted into four muls of type <4 x i32> in legalization. We actually need after the transformation is two muls of type <8 x i16>, it is more difficult to merge pairs of muls if it is done after legalization.
Ah, ok, got it, thanks.
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:26544
@@ +26543,3 @@
+ OpsVT, SubN0, SubN1);
+ SDValue ResLo = DAG.getNode(X86ISD::UNPCKL, DL, OpsVT, MulLo, MulHi);
+ ResLo = DAG.getNode(ISD::BITCAST, DL, ResVT, ResLo);
----------------
wmi wrote:
> mkuper wrote:
> > Why not use a generic shuffle here? Is the lowering further down not good enough to get the right unpacks?
> Because I feel using generic shuffle doesn't make the code simpler. We still need two shuffles, two bitcast and one concat_vectors.
It will actually make the code a bit more complex, really. What I had in mind was that if the result of the mul itself feeds a shuffle, leaving generic shuffles here may expose more dag combines further down the road.
Repository:
rL LLVM
http://reviews.llvm.org/D20931
More information about the llvm-commits
mailing list