[PATCH] [x86] Begin a significant overhaul of how vector lowering is done in the x86 backend.

Fri Jun 20 19:18:37 PDT 2014

Many thanks for this patch! Making the shuffle code saner and tidier is awesome!

I'm posting a few comments for v2* and v4* for now. Will check the other ones later. Most of these are nitpicks, but the ones on line 6816 and 6836 look like bugs (either on the code or in my brain).

I also don't like the phrasing “We rely heavily on "undef" masks preserving the input lane.”
>From what I've seen, it seems that we will treat undefs as if the corresponding element for the source vector was specified. But the code doesn't assume that anywhere (maybe it does in places further down). My first reading parsed it as “we need to be sure that undefs are treated this way”.

I would also echo most of grosbach's comments.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:6782
@@ +6781,3 @@
+// NB: We rely heavily on "undef" masks preserving the input lane.
+static SDValue getV4ShuffleImmForMask(ArrayRef<int> Mask, SelectionDAG &DAG) {
+  assert(Mask.size() == 4 && "Only 4-lane shuffle masks");
----------------
This is similar to a (DAG) shuffle mask, but not quite (it only goes up (non-inclusive) to 4, not 8, which you would have on shuffles for two v4* vectors). Could you add some word (Shufp?) to the name, to make it easier to spot the difference?

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:6803
@@ +6802,3 @@
+  ArrayRef<int> Mask = SVOp->getMask();
+  assert(Mask.size() == 2 && "Unexpected mask size for v2 shuffle!");
+
----------------
Chandler Carruth wrote:
> Jim Grosbach wrote:
> > Likewise, it may be useful to expose a direct method on ShuffleVectorSDNode for getting the lane count.
> > 
> > int getNumElements() const { return getValueType(0).getVectorNumElements(); }
> I don't know that this makes the code any better. By directly asserting properties of the Mask, to me, it makes the following code that assumes the properties on the mask much more clear.
Shouldn't we test, at a higher level, that the shuffle's type is v2f64? I don't think we can get a malformed SDValue that's a v2f64 with more (or fewer) than 2 elements in the mask. The same for the other functions.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:6808
@@ +6807,3 @@
+    // sharing the input and destination.
+    unsigned SHUFPDMask = (Mask[0] & 1u) | ((Mask[1] & 1u) << 1u);
+    return DAG.getNode(X86ISD::SHUFP, SDLoc(Op), MVT::v2f64, V1, V1,
----------------
Since the possible values are UNDEF (-1), 0, and 1, wouldn't simply Mask[x] == 1u be easier to understand (This would invert how we handle UNDEFs in this case)

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:6816
@@ +6815,3 @@
+  unsigned SHUFPDMask = (Mask[0] & 1u) | (((Mask[1] - 2) & 1u) << 1u);
+  return DAG.getNode(X86ISD::SHUFP, SDLoc(Op), MVT::v2f64, V1, V2,
+                     DAG.getConstant(SHUFPDMask, MVT::i8));
----------------
I might be missing something. How does this handle an <i32 2, i32 0> mask? Wouldn't it generate 

  { V1[0], V2[0] }

 instead of 

  { V2[0], V1[0] }

?

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:6836
@@ +6835,3 @@
+        std::max(Mask[1], 0) * 2, std::max(Mask[1], 0) * 2 + 1};
+    return DAG.getNode(X86ISD::PSHUFD, SDLoc(Op), MVT::v4i32, V1,
+                       getV4ShuffleImmForMask(WidenedMask, DAG));
----------------
What about bitcasting the PSHUFD output back to v2i64?

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:6864
@@ +6863,3 @@
+
+  if (NumV2Elements == 0)
+    // Straight shuffle of a single input vector. We pass the input vector to
----------------
Chandler Carruth wrote:
> Jim Grosbach wrote:
> > Why not use the helper function you defined above to check for the shuffle being single input?
> > 
> > I guess because you want to use the element count directly in the following code?
> Exactly. When I have to count anyways, I may as well re-use that.
Since we use NumV2Elements a bunch of times, I'd be ok with not going through the mask two times when we can just go one time.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:6955
@@ +6954,3 @@
+  // FIXME: Check if there is an encoding size reason to use PSHUFD when we
+  // don't pay the domain penalty.
+  return DAG.getNode(
----------------
Nitpick: I would change the phrasing. If we only use one shufps, shufps is smaller (2 opcode bytes vs. 3 opcode bytes). If we need two shufps, then it's better to just use shufpd. The comment, as it is, seems, to me, like it suggests we still need to check if there's any case where this is helpful.

http://reviews.llvm.org/D4225