[PATCH] [x86] Begin a significant overhaul of how vector lowering is done in the x86 backend.

Mon Jun 23 10:50:33 PDT 2014

My overarching concern is to make sure the code is understandable a year or three or five from now. There's no getting around the necessary complexity of the code due to the ISAs we have to work with today, but as new ISA extensions come out, it's only going to get more so. As such, it's immensely important that the code be understandable to a level that guides someone doing that to work where and how to insert those new tricks. Thus my hounding more about comments than the actual details of the code. The code itself looks great. I want to make sure it stays that way after barbarians like me get their hands on it in the coming years. :)

Looks like one of my comments from before got eaten. Trying again, but in case it doesn't go through there this time either, you're missing a "#include <numeric>". Otherwise you don't have a declaration for std::accumulate.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:6769
@@ +6768,3 @@
+  for (int i = 0, Size = Mask.size(); i < Size; ++i)
+    if (Mask[i] != -1 && Mask[i] != i)
+      return false;
----------------
Chandler Carruth wrote:
> Jim Grosbach wrote:
> > You comment on this below, but IMO it's worth calling out explicitly here that UNDEF masks are required to be implemented as preserving the input lane and thus are considered a NOP.
> > 
> > This and isSingleInputShuffleMask() should probably be static methods of ShuffleVectorSDNode instead. That would be similar to how isSplatMask() is implemented (side note, that should probably switch to use an ArrayRef instead of an EVT). Then add simple non-static wrappers isNoopShuffle() and isSingleInputShuff() to the class which call down to the static versions using the instance's mask.
> Regarding the undef mask contract below, it doesn't impact this at all. An undef lane, by its nature, is a NOP?
> 
> 
> I really dislike static methods for these at the moment. These have nothing to do with an SDNode and everything to do with the specific ways in which masks are used in this lowering code. I'm hesitant to over-generalize here.
> 
> Also, static methods are horrible to actually use.
> 
> If there is useful common logic to refactor into common locations, I'd rather do it if and when it is actually needed to implement common functionality. As it stands, it wouldn't save anything?
I'm more concerned that it's inconsistent with what we already have (isSplatMask()). I don't care much which idiom we use (static method vs. static function), but do think we should be consistent.

I disagree that they have nothing to do with the SDNode. They're for asking questions about a mask either of or for an SDNode. Now, strictly speaking, the mask is a distinct thing, it's true, and we could have a dedicated class or namespace for that, I suppose.

Static methods are no harder to use than the static functions you already have. It's just a scope.

On the other hand, do other targets actually want answers to the questions these helpers ask? If not, maybe you're right that they're more closely related to the X86 lowering code than the generic stuff. So I suggest we get an answer pragmatically. Can you have a look at the other targets that do shuffles and see if they have something similar? If so, combine the use cases with something generic and if not, leave it as you've written it currently. Sound reasonable?

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:6781
@@ +6780,3 @@
+
+// NB: We rely heavily on "undef" masks preserving the input lane.
+static SDValue getV4ShuffleImmForMask(ArrayRef<int> Mask, SelectionDAG &DAG) {
----------------
Chandler Carruth wrote:
> David Majnemer wrote:
> > Jim Grosbach wrote:
> > > "NB?"
> > It is probably an abbreviation for [[ http://en.wikipedia.org/wiki/Nota_bene | nota bene ]]
> Correct. I don't care particularly about comment style; I'll probably clean this up along with adding the comment above.
Cool. Just not used to latin in source code. ;)

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:6813
@@ +6812,3 @@
+  assert(Mask[0] >= 0 && Mask[0] < 2);
+  assert(Mask[1] >= 2);
+
----------------
Chandler Carruth wrote:
> Jim Grosbach wrote:
> > Comment strings on the asserts.
> When there is a generally useful bit of information that is being asserted, I try to do so. However, these asserts are just documenting the mathematical invariants established by the above condition. There isn't any useful comment to give here IMO.
I'd still prefer there be something. As a matter of consistent style, if nothing else. Just something that which tells me what other properties I should be looking at to see what happened. For example, distinguishing between a mask that's invalid on its face for the vector type vs. one that's valid but shouldn't have gotten this far because the code above was supposed to handle it.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:6859
@@ +6858,3 @@
+  SDValue LowV = V1, HighV = V2;
+  int NewMask[4] = {Mask[0], Mask[1], Mask[2], Mask[3]};
+
----------------
Chandler Carruth wrote:
> Jim Grosbach wrote:
> > space after "{" and before "}"
> See the coding guidelines and clang-format -- this is how it formats braces specifically to interoperate cleanly with braced init lists in C++11.
clang-format agrees with me. That's why I made the comment, actually.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:6874
@@ +6873,3 @@
+      ++V2Index;
+    int V2AdjIndex = V2Index ^ 1;
+
----------------
Chandler Carruth wrote:
> Jim Grosbach wrote:
> > This is a neat trick for getting the other lane of the high half, but it isn't immediately obvious that's what it's doing. Either a ?: to make the selection explicit or a comment explaining the bitwise trick would be good.
> I have a distaste for comments which merely explain what the standard says the C++ in question does.
> 
> I tried to help this by naming the variable after the "adjacent" index. Is there a better variable name that would give the context? If not I can do something else...
It's not just explaining what the standard says, though. It explains why. I spend a couple minutes looking at that code wondering what you were up to before I realized and I had to work through the bit patterns in my head to do it. Maybe I'm just being dense and it should have been obvious, but it wasn't. I really think a comment explaining the intent (not the language semantics) is very useful here.

The abbreviation certainly didn't help. I thought it meant "Adjusted" not "Adjacent".

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:6886
@@ +6885,3 @@
+      // blended with a V1 element. Regardless of how many other V1 elements we
+      // have, this will always require a two-step blend.
+      int V1Index = V2AdjIndex;
----------------
Chandler Carruth wrote:
> Jim Grosbach wrote:
> > Big ask, but I'd love some ASCII art type examples of the transforms, in particular for those like this one that need more than one instruction. Not the complete set of possibilities; just a single example that's illustrative showing arrows for which lanes get moved where by which insn.
> I'm hesitant to do this. Every time I drew an ascii art picture (and I drew several) or something on a white board, it actually messed me up because I started to silently assume that the one example I had was actually representative of the other ways to hit the same scenario.
> 
> Unless this is breaking your ability to reason about the code, I'm going to hold off investing a ton of time in crafting these...
I'm drawing lots of stuff on paper to be able to reason about the code. Some comments to help guide that thought process would help.

>From another perspective, comments breaking down what the code is intended to cover help review to make sure it matches what it actually does cover.

I understand that a mathematically sound exposition of all the cases and how they transpose into one another is likely not practical, but I still think a few example would be useful.

Doesn't have to be ASCII art. I just like pictures. Simple examples, even expressed in terms of __builtin_shufflevector() .c code perhaps, would go a long way.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:6917
@@ +6916,3 @@
+                          (Mask[0] >= 4 ? Mask[0] : Mask[1]) - 4,
+                          (Mask[2] >= 4 ? Mask[2] : Mask[3]) - 4};
+      V1 = DAG.getNode(X86ISD::SHUFP, DL, MVT::v4f32, V1, V2,
----------------
Chandler Carruth wrote:
> Jim Grosbach wrote:
> > Spacing for "{" "}".
> Same comment, this is the clang-format enforced style and matches the coding standards.
> 
> I'll give a citation to help: http://llvm.org/docs/CodingStandards.html#braced-initializer-lists
Then clang-format should be fixed. That's what I used to verify I was correct. I just tried again and got the same result.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:6965
@@ +6964,3 @@
+    SDLoc DL, SDValue V, MutableArrayRef<int> Mask,
+    const X86Subtarget *Subtarget, SelectionDAG &DAG) {
+  MutableArrayRef<int> LoMask = Mask.slice(0, 4);
----------------
Chandler Carruth wrote:
> Jim Grosbach wrote:
> > This is a really tricky part of the patch. Can you add a comment describing the approach and algorithm to the function? Without a lot of very explicit exposition, I'm concerned about future "improvements" to this code path inadvertently breaking things.
> Yea, I didn't have that comment because there were many tens of different approaches tried before one worked. I'm also still hoping to simplify some parts of it.
> 
> Also, all the details are already documented in each branch of the hybrid approach.
Maybe something about "there be dragons here, and not the friendly kind" then? If it was that tricky to get right, I'm more concerned about someone helpfully coming along and "fixing" it later without understanding all the cases. Tests go a long way there, of course, but I don't like relying on that exclusively.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:7000
@@ +6999,3 @@
+    int DWordA = (ThreeInputHalfSum -
+                  std::accumulate(ThreeInputs.begin(), ThreeInputs.end(), 0)) /
+                 2;
----------------
llvm/lib/Target/X86/X86ISelLowering.cpp:7000:24: error: no member named 'accumulate' in namespace 'std'

Need to add "#include <numeric>" I think.

http://reviews.llvm.org/D4225