[PATCH] D24683: [DAGCombine] Generalize build_vector -> vector_shuffle combine for more than 2 inputs

Sat Sep 17 17:06:31 PDT 2016

mkuper added a comment.

In https://reviews.llvm.org/D24683#545621, @RKSimon wrote:

> Nice work! There's a lot going on here which makes we wonder whether this should be split into several patches:
>
> 1 - createBuildVecShuffle - generalise support for the shuffling of different sized inputs + outputs
>  2 - shuffling with constant values (not just zeros)
>  3 - support any number of inputs


Thanks, Simon!

I'm not sure what you mean by "split". :-)
This patch only contains item 3.

Re 1 - splitting the shuffle logic into "createBuildVecShuffle" doesn't generalize anything, it's just a refactoring, that only makes sense once we have more than 2 inputs. The restrictions on input sizes existed before this patch.
Re 2 - we have only been blending with zeroes for years. I agree we should probably blend with any constant vectors, and will be happy to add yet another TODO. However, that seems orthogonal to this patch.

In any case, I don't see a reason to gate (3) on either (1) or (2).


================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:12900
@@ -12977,13 +12899,3 @@
   if (InVT1 != VT || InVT2 != VT) {
-    // Both inputs and the output must have the same base element type.
-    EVT ElemType = VT.getVectorElementType();
-    if (ElemType != InVT1.getVectorElementType() ||
-        ElemType != InVT2.getVectorElementType())
-      return SDValue();
-
-    // TODO: Canonicalize this so that if the vectors have different lengths,
-    // VecIn1 is always longer.
-
-    // The element types match, now figure out the lengths.
     if (InVT1.getSizeInBits() * 2 == VT.getSizeInBits() && InVT1 == InVT2) {
       // If both input vectors are exactly half the size of the output, concat
----------------
RKSimon wrote:
> Support (VT.getSizeInBits()  % InVT1.getSizeInBits()) == 0 ? Not just x2?
There's a TODO for this in line 12934.
This isn't really new code, we've been doing this only for x2 for ages - some of it is really old, some I've added sometime in 2014. (I refactored it somewhat in r281283 to make this patch easier, but that didn't change the logic.)

================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:12906
@@ -12993,8 +12905,3 @@
       VecIn2 = SDValue();
-      // If we have one "real" input and are blending with zero, we need the
-      // zero elements to come from the second input, not the undef part of the
-      // first input.
-      if (UsesZeroVector)
-        Vec2Offset = NumElems;
     } else if (InVT1.getSizeInBits() == VT.getSizeInBits() * 2) {
       if (!TLI.isExtractSubvectorCheap(VT, NumElems))
----------------
RKSimon wrote:
> Support (InVT1.getSizeInBits()  % VT.getSizeInBits()) == 0 ? Not just x2?
Same as above.

================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:12981
@@ +12980,3 @@
+// at most two distinct vectors, turn this into a shuffle node.
+// TODO: Support more than two inputs by constructing a tree of shuffles.
+SDValue DAGCombiner::reduceBuildVecToShuffle(SDNode *N) {
----------------
RKSimon wrote:
> This comment needs a rewrite now that you're supporting any number of inputs.
Argh, right, missed it. Thanks.

================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:13005
@@ +13004,3 @@
+  // TODO: Would it be better to replace VecMap with a search in VecIn?
+  // The vector is expected to be very small...
+  SmallVector<int, 8> VectorMask;
----------------
RKSimon wrote:
> Yes VecMap is probably overkill.
Ok, will rewrite with a search, unless someone objects. Too bad we don't have a SmallMap.

================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:13006
@@ +13005,3 @@
+  // The vector is expected to be very small...
+  SmallVector<int, 8> VectorMask;
+  SmallVector<SDValue, 8> VecIn;
----------------
RKSimon wrote:
> Initialise to VectorMask(NumElems, -1) ?
As usual. :-)

================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:13020
@@ +13019,3 @@
+    // See if we can use a blend with a zero vector.
+    if (isNullConstant(Op) || isNullFPConstant(Op)) {
+      UsesZeroVector = true;
----------------
RKSimon wrote:
> Could we not generalise this to any constant? I'm thinking about cases such as float4 where the .w component is set to 1.0.
I think we could. 

Note that this isn't really new - the blend with zero logic has been here for a while.
I assume the motivation was that blends with zero are especially cheap, because you don't even need to load the constant. But I agree that general constants should probably also be a win. That's for a separate patch, though. I'll add a TODO.

================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:13050
@@ +13049,3 @@
+  // If we didn't find at least one input vector, bail out.
+  if (VecIn.size() < 2)
+    return SDValue();
----------------
RKSimon wrote:
> Rewrite the comment to explain its one input vector + zero vector. 
I don't think it's true - it can really be a single input vector.

================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:13073
@@ +13072,3 @@
+  SmallVector<SDValue, 4> Shuffles;
+  for (unsigned In = 0; In < (VecIn.size() / 2); ++In) {
+    unsigned LeftIdx = 2 * In + 1;
----------------
RKSimon wrote:
> for (unsigned In = 0, Len = (VecIn.size() / 2); In < Len; ++In) {
Right, thanks.

================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:13123
@@ +13122,3 @@
+    }
+    for (unsigned In = 0; In < CurSize / 2; ++In) {
+      int Left = 2 * In;
----------------
RKSimon wrote:
> for (unsigned In = 0, Len = CurSize / 2; In < Len; ++In) {
Right, thanks.


https://reviews.llvm.org/D24683