[PATCH] D12325: Improve ISel using across lane addition reduction
James Molloy via llvm-commits
llvm-commits at lists.llvm.org
Wed Aug 26 05:32:32 PDT 2015
jmolloy requested changes to this revision.
jmolloy added a comment.
This revision now requires changes to proceed.
Hi,
Thanks for working on this! I'm generally happy with the algorithm, but I have a bunch of comment and style requests.
Cheers,
James
================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:8568
@@ -8566,1 +8567,3 @@
+/// Target-specific DAG combine for the across vector add reduction (addv).
+/// Example:
----------------
I'd like to see here an example using SDAG nodes. It's nice to have the assembly before/after, but what's really important is the exact pattern this function is intending to match, and what it replaces it with.
For example:
%2 = SHUFFLE_VECTOR ...
%1 = ADD %2, %3
EXTRACT_ELEMENT %1, 0
Something like that.
And specifically, this is the log2-shuffle pattern that the LoopVectorizer produces. Please make that very explicit, because this code won't match all reductions.
================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:8583
@@ +8582,3 @@
+ SDValue N1 = N->getOperand(1);
+ // If the input vector is not an ADD, can't do such combine.
+ if (N0->getOpcode() != ISD::ADD)
----------------
Well actually you can - FMINNUM, FMINNAN, FMAXNUM, FMAXNAN, SMIN, SMAX, UMIN and UMAX all have lane-reduce variants in the A64 instruction set.
I understand you may not want to handle them all at this time, but:
* Please make the comment explicit about what you are and aren't handling.
* Please replace uses of "Add" later on with "binop" or something - so that it isn't perceived that this code is specific to Adds. It's not, it's just only been tested with adds. It should work for any commutative binary op.
================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:8596
@@ +8595,3 @@
+ unsigned NumVecElts = N0.getValueType().getVectorNumElements();
+ unsigned NumMaxSubAddElts = NumVecElts / 2;
+ unsigned NumAddElts = 1;
----------------
Why is this called NumMaxSubAddElts? (Where's the Sub come from?)
================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:8600
@@ +8599,3 @@
+ // Iterate over each step of the reduction.
+ while (NumAddElts <= NumMaxSubAddElts) {
+ if (InputADD.getOpcode() != ISD::ADD)
----------------
It'd be more readable IMHO to calculate the number of expected steps beforehand (as lg(N)), and just have normal counter here. It would make the algorithm more clear.
================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:8603
@@ +8602,3 @@
+ return SDValue();
+ SDValue ADD = InputADD.getOperand(0);
+ SDValue SV = InputADD.getOperand(1);
----------------
Please don't fully capitalize local variables - stick to the naming convention of UpperCamelCase (unless using an initialism, so SV is fine).
================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:8604
@@ +8603,3 @@
+ SDValue ADD = InputADD.getOperand(0);
+ SDValue SV = InputADD.getOperand(1);
+ if (SV.getOpcode() != ISD::VECTOR_SHUFFLE) {
----------------
... That said, it'd be nice to call this "Shuffle", to be very explicit about it.
================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:8605
@@ +8604,3 @@
+ SDValue SV = InputADD.getOperand(1);
+ if (SV.getOpcode() != ISD::VECTOR_SHUFFLE) {
+ ADD = InputADD.getOperand(1);
----------------
Comment what you're doing here, which is to allow the add's operands to be commutative.
================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:8627
@@ +8626,3 @@
+ // step 1 : <1,u,u,u,u,u,u,u>
+ for (unsigned int i = 0; i < NumVecElts; ++i)
+ if ((i >= NumAddElts && Mask[i] >= 0) ||
----------------
You've given an example here, which is good, but you haven't defined the *requirements* on the mask indices which the code below is intending to enforce.
Without that information it is difficult to know if the code has a bug in it.
Also, use "unsigned" instead of "unsigned int".
================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:8628
@@ +8627,3 @@
+ for (unsigned int i = 0; i < NumVecElts; ++i)
+ if ((i >= NumAddElts && Mask[i] >= 0) ||
+ (i < NumAddElts &&
----------------
I'm assuming this is saying that all elements greater than NumAddElts must be undef? better to use "Mask[i] != 0" instead of ">= 0" I think, so it is more obvious.
================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:8629
@@ +8628,3 @@
+ if ((i >= NumAddElts && Mask[i] >= 0) ||
+ (i < NumAddElts &&
+ static_cast<unsigned>(Mask[i]) != (NumAddElts + i)))
----------------
Instead of static_casting here, just use a signed induction variable.
================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:8633
@@ +8632,3 @@
+ // Move the next step.
+ InputADD = ADD;
+ NumAddElts = NumAddElts << 1;
----------------
I'm not convinced that "InputADD" is the most descriptive name here, but I don't have an explicit alternative in mind.
http://reviews.llvm.org/D12325
More information about the llvm-commits
mailing list