[PATCH] Generation of PSAD in LoopVectorizer
Vijender Reddy
vj.muthyala at amd.com
Sun Apr 5 23:06:03 PDT 2015
Thank you James. Comments below.
REPOSITORY
rL LLVM
================
Comment at: include/llvm/CodeGen/ISDOpcodes.h:647
@@ +646,3 @@
+ /// SAD - This corresponds to a Sum of Absolute Difference(SAD) instruction.
+ /// The operands are vectors of char type with length of vector being power
+ /// of 2 and the result is a scalar integer.
----------------
jmolloy wrote:
> "of char type"? Do you mean i8? Why only i8? Why is the (scalar part of the) input type not the same as the output type?
[VJ] - The Sum of absolute difference instruction is only appropriate for char Types. It is a byte level reduction operation. If you have a look at PSAD in X86 or USAD in ARM, all happen at byte level.
[VJ] - The scalar type of output is not same as scalar type of input because sum of eight i8's may not fit properly in an i8 without data lose.
================
Comment at: include/llvm/IR/Intrinsics.td:596
@@ +595,3 @@
+// Calculate the Sum of Absolute Differences (SAD) of the two input vectors.
+// Only vectors of char Type are allowed.
+def int_sad : Intrinsic<[llvm_i32_ty],
----------------
jmolloy wrote:
> Char? Why i32? why not i64? why not i8/i16?
[VJ] - I accept your point. We can make it to return any type.
================
Comment at: include/llvm/IR/Intrinsics.td:597
@@ +596,3 @@
+// Only vectors of char Type are allowed.
+def int_sad : Intrinsic<[llvm_i32_ty],
+ [llvm_anyvector_ty, llvm_anyvector_ty],
----------------
jmolloy wrote:
> What is the signedness of this operation? Is there a floating point equivalent? If not, why not?
[VJ] - There is no floating point equivalent of SAD.
================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:642
@@ +641,3 @@
+ // this pattern. For example, for x86 the minimum VF for SAD is 8.
+ unsigned minVF;
+ // maxVF - Specifies the maximum VF for which the target supports
----------------
jmolloy wrote:
> I don't like the assumption here that the legality of an operation is a range [min,max]. I don't think we should assume continuity in the range. Perhaps this should be a bitset of supported types instead?
[VJ] - Can you please explain how to solve this issue. Are you telling to have supported types of the instruction in a list and check that instead of checking the VF factor?
================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:767
@@ +766,3 @@
+ /// which is part of special pattern in a basic block.
+ typedef DenseMap<Instruction *, SpecialPatternDescriptor> ActionList;
+
----------------
jmolloy wrote:
> Shouldn't this be a multimap? I can imagine that there might be complex patterns and simple patterns, and a complex pattern may subsume a simple pattern. So an instruction could be part of two patterns (until one is selected).
[VJ] - Ok. How about checking the instruction whether it belongs to any other pattern before inserting in the ActionList. Because I think it will not make sense to keep that instruction in both the patterns. Rather we can select which pattern to associate that instruction before inserting. What do you think?
================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:1014
@@ -942,1 +1013,3 @@
+ /// If there is no special pattern then the widest type is returned.
+ unsigned selectBestTypeForPattern();
----------------
jmolloy wrote:
> Uh, surely this needs to be just the type of the phi? If smaller types are legal, the phi should have a smaller type, right?
>
> Your proposed algorithm falls down in this case:
>
> int32_t a;
> int8_t b;
> int32_t *c;
> for (...) {
> a += abs(c[i] - c[i+1]);
> b += (int8_t)c[i];
> }
>
> Oh dear, you'll select "i8" as your type but that is illegal (it's not part of this PHI).
[VJ] Actually this value is required to select the maximum VF factor to try. So lower the value the higher the vectorization factors I can try.
================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:1050
@@ +1049,3 @@
+ /// for a given vector width.
+ unsigned SpecialPhiCost(PHINode *p, unsigned VF);
+
----------------
jmolloy wrote:
> I generally don't like the use of "Special". Can we not think of a more descriptive name for it, like "Pattern"?
[VJ] - This "Special" word was suggested in the previous review comments in RFC.
================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:2556
@@ -2478,2 +2555,3 @@
case RK_IntegerOr:
+ case RK_IntegerSpecial:
// Adding, Xoring, Oring zero to a number does not change it.
----------------
jmolloy wrote:
> Well this isn't right - Who says the identity for all pattern based reductions is integer zero?
[VJ] - I got your point. I need to send the Phi node as one of the arguments to this function to know more about the type and return the respective identity. Will work on it.
================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:2582
@@ -2503,1 +2581,3 @@
return Instruction::Add;
+ case LoopVectorizationLegality::RK_IntegerSpecial:
+ return Instruction::Add;
----------------
jmolloy wrote:
> If it assumed that all patterns can be added together, that isn't documented anywhere.
[VJ] - I got your point. I need to send the Phi node as one of the arguments to this function to know more about the type and return the respective instruction. Will work on it.
================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:3014
@@ -2876,3 +3013,3 @@
- if (VF > 1) {
+ if (VF > 1 && !isSAD) {
// VF is a power of 2 so we can emit the reduction using log2(VF) shuffles
----------------
jmolloy wrote:
> I don't like this at all. You've introduced a new generic pattern type then binned that and special-cased for SAD. Why is SAD special? why is it different from ADD for this case?
[VJ] the code inside the IF block reduces the vector to a single scalar value. Here my SAD intrinsic output is already a scalar. I need to skip this merger block for SAD.
================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:4862
@@ +4861,3 @@
+ switch(SPD.Kind) {
+ case LoopVectorizationLegality::SPK_Sad:
+ specialPatternWidth = std::min(specialPatternWidth,(unsigned)8);
----------------
jmolloy wrote:
> I think here you're special casing the SpecialPattern stuff when really it shouldn't be special-cased - this is a problem (finding the smallest valid type for an operation) that applies in many cases to most operators, not just your patterns.
>
> I think it gives the biggest impact in ARM/MIPS world though as i8 and i16 aren't legal scalar types for us.
[VJ] - So shall I replace the getWidestType in the code with the getSmallestType so that it can try all the vectorization factors? Anyways cost modeling already handles everything properly so I feel there is no harm in using getSmallestType.
http://reviews.llvm.org/D8136
EMAIL PREFERENCES
http://reviews.llvm.org/settings/panel/emailpreferences/
More information about the llvm-commits
mailing list