[PATCH] Generation of PSAD in LoopVectorizer

Sun Apr 5 23:06:03 PDT 2015

Thank you James. Comments below.


REPOSITORY
  rL LLVM

================
Comment at: include/llvm/CodeGen/ISDOpcodes.h:647
@@ +646,3 @@
+    /// SAD - This corresponds to a Sum of Absolute Difference(SAD) instruction.
+    /// The operands are vectors of char type with length of vector being power
+    /// of 2 and the result is a scalar integer.
----------------
jmolloy wrote:
> "of char type"? Do you mean i8? Why only i8? Why is the (scalar part of the) input type not the same as the output type?
[VJ] - The Sum of absolute difference instruction is only appropriate for char Types. It is a byte level reduction operation. If you have a look at PSAD in X86 or USAD in ARM, all happen at byte level.
[VJ] - The scalar type of output is not same as scalar type of input because sum of eight  i8's may not fit properly in an i8 without data lose.


================
Comment at: include/llvm/IR/Intrinsics.td:596
@@ +595,3 @@
+// Calculate the Sum of Absolute Differences (SAD) of the two input vectors.
+// Only vectors of char Type are allowed.
+def int_sad  : Intrinsic<[llvm_i32_ty],
----------------
jmolloy wrote:
> Char? Why i32? why not i64? why not i8/i16?
[VJ] - I accept your point. We can make it to return any type.

================
Comment at: include/llvm/IR/Intrinsics.td:597
@@ +596,3 @@
+// Only vectors of char Type are allowed.
+def int_sad  : Intrinsic<[llvm_i32_ty],
+                         [llvm_anyvector_ty, llvm_anyvector_ty],
----------------
jmolloy wrote:
> What is the signedness of this operation? Is there a floating point equivalent? If not, why not?
[VJ] - There is no floating point equivalent of SAD.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:642
@@ +641,3 @@
+    // this pattern. For example, for x86 the minimum VF for SAD is 8.
+    unsigned minVF;
+    // maxVF - Specifies the maximum VF for which the target supports
----------------
jmolloy wrote:
> I don't like the assumption here that the legality of an operation is a range [min,max]. I don't think we should assume continuity in the range. Perhaps this should be a bitset of supported types instead?
[VJ] - Can you please explain how to solve this issue. Are you telling to have  supported types of the instruction in a list and check that instead of checking the VF factor?

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:767
@@ +766,3 @@
+  /// which is part of special pattern in a basic block.
+  typedef DenseMap<Instruction *, SpecialPatternDescriptor> ActionList;
+
----------------
jmolloy wrote:
> Shouldn't this be a multimap? I can imagine that there might be complex patterns and simple patterns, and a complex pattern may subsume a simple pattern. So an instruction could be part of two patterns (until one is selected).
[VJ] - Ok. How about checking the instruction whether it belongs to any other pattern before inserting in the ActionList. Because I think it will not make sense to keep that instruction in both the patterns. Rather we can select which pattern to associate that instruction before inserting. What do you think? 

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:1014
@@ -942,1 +1013,3 @@
+  /// If there is no special pattern then the widest type is returned. 
+  unsigned selectBestTypeForPattern();
 
----------------
jmolloy wrote:
> Uh, surely this needs to be just the type of the phi? If smaller types are legal, the phi should have a smaller type, right?
> 
> Your proposed algorithm falls down in this case:
> 
>     int32_t a;
>     int8_t b;
>     int32_t *c;
>     for (...) {
>       a += abs(c[i] - c[i+1]);
>       b += (int8_t)c[i];
>     }
> 
> Oh dear, you'll select "i8" as your type but that is illegal (it's not part of this PHI).
[VJ] Actually this value is required to select the maximum VF factor to try. So lower the value the higher the vectorization factors I can try.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:1050
@@ +1049,3 @@
+  /// for a given vector width.
+  unsigned SpecialPhiCost(PHINode *p, unsigned VF);
+
----------------
jmolloy wrote:
> I generally don't like the use of "Special". Can we not think of a more descriptive name for it, like "Pattern"?
[VJ] - This "Special" word was suggested in the previous review comments in RFC.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:2556
@@ -2478,2 +2555,3 @@
   case RK_IntegerOr:
+  case RK_IntegerSpecial:
     // Adding, Xoring, Oring zero to a number does not change it.
----------------
jmolloy wrote:
> Well this isn't right - Who says the identity for all pattern based reductions is integer zero?
[VJ] - I got your point. I need to send the Phi node as one of the arguments to this function to know more about the type and return the respective identity. Will work on it.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:2582
@@ -2503,1 +2581,3 @@
       return Instruction::Add;
+    case LoopVectorizationLegality::RK_IntegerSpecial:
+      return Instruction::Add;
----------------
jmolloy wrote:
> If it assumed that all patterns can be added together, that isn't documented anywhere. 
[VJ] - I got your point. I need to send the Phi node as one of the arguments to this function to know more about the type and return the respective instruction. Will work on it.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:3014
@@ -2876,3 +3013,3 @@
 
-    if (VF > 1) {
+    if (VF > 1 && !isSAD) {
       // VF is a power of 2 so we can emit the reduction using log2(VF) shuffles
----------------
jmolloy wrote:
> I don't like this at all. You've introduced a new generic pattern type then binned that and special-cased for SAD. Why is SAD special? why is it different from ADD for this case?
[VJ] the code inside the IF block reduces the vector to a single scalar value. Here my SAD intrinsic output is already a scalar. I need to skip this merger block for SAD.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:4862
@@ +4861,3 @@
+          switch(SPD.Kind) {
+            case LoopVectorizationLegality::SPK_Sad:
+              specialPatternWidth = std::min(specialPatternWidth,(unsigned)8);
----------------
jmolloy wrote:
> I think here you're special casing the SpecialPattern stuff when really it shouldn't be special-cased - this is a problem (finding the smallest valid type for an operation) that applies in many cases to most operators, not just your patterns.
> 
> I think it gives the biggest impact in ARM/MIPS world though as i8 and i16 aren't legal scalar types for us.
[VJ] - So shall I replace the getWidestType in the code with the getSmallestType so that it can try all the vectorization factors? Anyways cost modeling already handles everything properly so I feel there is no harm in  using getSmallestType.

http://reviews.llvm.org/D8136

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/