[PATCH] D12325: Improve ISel using across lane addition reduction

Wed Aug 26 05:32:32 PDT 2015

jmolloy requested changes to this revision.
jmolloy added a comment.
This revision now requires changes to proceed.

Hi,

Thanks for working on this! I'm generally happy with the algorithm, but I have a bunch of comment and style requests.

Cheers,

James


================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:8568
@@ -8566,1 +8567,3 @@
 
+/// Target-specific DAG combine for the across vector add reduction (addv).
+/// Example:
----------------
I'd like to see here an example using SDAG nodes. It's nice to have the assembly before/after, but what's really important is the exact pattern this function is intending to match, and what it replaces it with.

For example:

%2 = SHUFFLE_VECTOR ...
%1 = ADD %2, %3
EXTRACT_ELEMENT %1, 0

Something like that.

And specifically, this is the log2-shuffle pattern that the LoopVectorizer produces. Please make that very explicit, because this code won't match all reductions.

================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:8583
@@ +8582,3 @@
+  SDValue N1 = N->getOperand(1);
+  // If the input vector is not an ADD, can't do such combine.
+  if (N0->getOpcode() != ISD::ADD)
----------------
Well actually you can - FMINNUM, FMINNAN, FMAXNUM, FMAXNAN, SMIN, SMAX, UMIN and UMAX all have lane-reduce variants in the A64 instruction set.

I understand you may not want to handle them all at this time, but:
  * Please make the comment explicit about what you are and aren't handling.
  * Please replace uses of "Add" later on with "binop" or something - so that it isn't perceived that this code is specific to Adds. It's not, it's just only been tested with adds. It should work for any commutative binary op.

================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:8596
@@ +8595,3 @@
+  unsigned NumVecElts = N0.getValueType().getVectorNumElements();
+  unsigned NumMaxSubAddElts = NumVecElts / 2;
+  unsigned NumAddElts = 1;
----------------
Why is this called NumMaxSubAddElts? (Where's the Sub come from?)

================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:8600
@@ +8599,3 @@
+  // Iterate over each step of the reduction.
+  while (NumAddElts <= NumMaxSubAddElts) {
+    if (InputADD.getOpcode() != ISD::ADD)
----------------
It'd be more readable IMHO to calculate the number of expected steps beforehand (as lg(N)), and just have  normal counter here. It would make the algorithm more clear.

================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:8603
@@ +8602,3 @@
+      return SDValue();
+    SDValue ADD = InputADD.getOperand(0);
+    SDValue SV = InputADD.getOperand(1);
----------------
Please don't fully capitalize local variables - stick to the naming convention of UpperCamelCase (unless using an initialism, so SV is fine).

================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:8604
@@ +8603,3 @@
+    SDValue ADD = InputADD.getOperand(0);
+    SDValue SV = InputADD.getOperand(1);
+    if (SV.getOpcode() != ISD::VECTOR_SHUFFLE) {
----------------
... That said, it'd be nice to call this "Shuffle", to be very explicit about it.

================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:8605
@@ +8604,3 @@
+    SDValue SV = InputADD.getOperand(1);
+    if (SV.getOpcode() != ISD::VECTOR_SHUFFLE) {
+      ADD = InputADD.getOperand(1);
----------------
Comment what you're doing here, which is to allow the add's operands to be commutative.

================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:8627
@@ +8626,3 @@
+    //   step 1 : <1,u,u,u,u,u,u,u>
+    for (unsigned int i = 0; i < NumVecElts; ++i)
+      if ((i >= NumAddElts && Mask[i] >= 0) ||
----------------
You've given an example here, which is good, but you haven't defined the *requirements* on the mask indices which the code below is intending to enforce.

Without that information it is difficult to know if the code has a bug in it.

Also, use "unsigned" instead of "unsigned int".

================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:8628
@@ +8627,3 @@
+    for (unsigned int i = 0; i < NumVecElts; ++i)
+      if ((i >= NumAddElts && Mask[i] >= 0) ||
+          (i < NumAddElts &&
----------------
I'm assuming this is saying that all elements greater than NumAddElts must be undef? better to use "Mask[i] != 0" instead of ">= 0" I think, so it is more obvious.

================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:8629
@@ +8628,3 @@
+      if ((i >= NumAddElts && Mask[i] >= 0) ||
+          (i < NumAddElts &&
+           static_cast<unsigned>(Mask[i]) != (NumAddElts + i)))
----------------
Instead of static_casting here, just use a signed induction variable.

================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:8633
@@ +8632,3 @@
+    // Move the next step.
+    InputADD = ADD;
+    NumAddElts = NumAddElts << 1;
----------------
I'm not convinced that "InputADD" is the most descriptive name here, but I don't have an explicit alternative in mind.


http://reviews.llvm.org/D12325