[PATCH] Generation of PSAD in LoopVectorizer

Mon Mar 9 10:36:59 PDT 2015

This patch needs regression tests, for for the vectorizer itself (in test/Transforms/LoopVectorize/X86) and CodeGen tests for the new intrinsic in test/CodeGen/X86.

Also, you need generic lowering support in LegalizeDAG for targets that don't support directly lowering the ISD node (and you should make sure that it is set by default to Expand in TargetLoweringBase::initActions).


================
Comment at: include/llvm/Analysis/TargetTransformInfo.h:453
@@ +452,3 @@
+  /// \returns the cost of SAD instruction.
+  unsigned getSADInstrCost(unsigned ISD, Type *RetTy, Type *op1,
+                        Type *op2) const;
----------------
If this function applies only to ISDs, then it should not be here. This header only contains functions directly useful to IR-level passes. You can add this as a callback in the BasicTTI implementation, specialized by the targets, in order to share code.


================
Comment at: include/llvm/Analysis/TargetTransformInfo.h:454
@@ +453,3 @@
+  unsigned getSADInstrCost(unsigned ISD, Type *RetTy, Type *op1,
+                        Type *op2) const;
+
----------------
Indenting is off.

================
Comment at: include/llvm/Analysis/TargetTransformInfo.h:573
@@ -568,1 +572,3 @@
+  virtual unsigned getSADInstrCost(unsigned ISD, Type *RetTy, Type *op1,
+                                Type *op2) = 0;
   virtual unsigned getNumberOfParts(Type *Tp) = 0;
----------------
Indenting is off.

================
Comment at: include/llvm/Analysis/TargetTransformInfo.h:729
@@ +728,3 @@
+  unsigned getSADInstrCost(unsigned ISD, Type *RetTy, Type *op1,
+                        Type *op2) override {
+    return Impl.getSADInstrCost(ISD, RetTy, op1, op2);
----------------
Indenting is off.

================
Comment at: include/llvm/CodeGen/ISDOpcodes.h:647
@@ +646,3 @@
+    /// SAD - This corresponds to a SAD instruction. The operands are vectors
+    /// and the result is a scalar type.
+    SAD,
----------------
Please write out what SAD stands for. Please also say something more about the semantics, and any constraints on the relationship between the scalar type and the input vector element type?


================
Comment at: include/llvm/IR/Intrinsics.td:584
@@ +583,3 @@
+// Instrinsic takes any vector as argument and generate the 
+// Sum of Absolute difference(SAD) Instrunction according to target.
+def int_sad  : Intrinsic<[llvm_i32_ty], 
----------------
Replace this comment with:

  // Calculate the Sum of Absolute Differences (SAD) of the two input vectors.

Also, I assume only integer vectors are supported, in which case, please say so.

(note that our IR-level intrinsics are defined only by their semantics, and we never guarantee to produce any specific instruction)

================
Comment at: lib/CodeGen/SelectionDAG/LegalizeDAG.cpp:1291
@@ +1290,3 @@
+  case ISD::SAD:
+    // Sad operation with illegal type will be custom-lowered.
+    Action = TLI.getOperationAction(Node->getOpcode(),
----------------
I don't understand this comment. Doesn't the beginning of this function assert the type legality of all node operands and return values (except for TargetConstants)?


================
Comment at: lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:4423
@@ +4422,3 @@
+    // if the type of SAD is legal i.e. v16i8, then the output of SAD 
+    // instruction will be v2i64. We need to extract  result from
+    // upper and lower half and add manually.
----------------
Why v2i64 for v16i8?

================
Comment at: lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:4426
@@ +4425,3 @@
+    if(LHS.getValueType() == MVT::v16i8 && RHS.getValueType() == MVT::v16i8)
+    {
+      SDValue sad = DAG.getNode(ISD::SAD, DL, MVT::v2i64, LHS, RHS);
----------------
{ should be on the previous line.

================
Comment at: lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:4438
@@ +4437,3 @@
+    }
+    else
+      // If the type is not legal then just generare SAD node which will
----------------
else should be on the line with the } (per our coding conventions)

================
Comment at: lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:4441
@@ +4440,3 @@
+      // be handled in custom lowering.
+      result = DAG.getNode(ISD::SAD, DL, MVT::i32, LHS, RHS);      
+  return result;
----------------
Shouldn't be i32, should the the type from the input intrinsic.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:630
@@ +629,3 @@
+    SpecialActionKind Action;
+    // minVF - tells about the minimum VF for which this pattern is possible
+    // For Instance, in X86 the minVF for SAD is 8. We can not think of pattern
----------------
How about:

  // minVF - Specifies the minimum VF for which the target supports this pattern. For example, for x86 the minimum VF for SAD is 8.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:2683
@@ +2682,3 @@
+    case LoopVectorizationLegality::SAK_Arg: {
+      Instruction *cv = dyn_cast<CastInst>(I);
+      if (cv) {
----------------
cv -> CV

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:2808
@@ +2807,3 @@
+          && SPD.Kind == LoopVectorizationLegality::SPK_Sad) {
+        //  assert(RdxDesc.SPKind == LoopVectorizationLegality::SPK_Sad &&
+        //    "Doesnot support other special kinds");
----------------
Remove commented-out code.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:4251
@@ +4250,3 @@
+
+    //  two subs and one icmp instruction are matched
+    Instruction *Sel = dyn_cast<Instruction>(UI->getOperand(0));
----------------
two -> Two

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:4358
@@ -4097,1 +4357,3 @@
                                             ReductionInstDesc &Prev) {
+  // This is to bypass this function for specialPatterns. We dont know which instruction 
+  // corresponds to reduction pattern.
----------------
Line too long.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:4548
@@ -4281,3 +4547,3 @@
 
-  unsigned WidestType = getWidestType();
+  unsigned WidestType = getSmallestType();
   unsigned WidestRegister = TTI.getRegisterBitWidth(true);
----------------
Widest == Smallest?

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:5023
@@ +5022,3 @@
+             PhiCost= SpecialPhiCost(p, VF);
+             DEBUG(dbgs() << "LV: Found an estimated cost of " << PhiCost << " for VF " <<
+                     VF << " For special Phi instruction: " << *it << '\n');
----------------
Line too long.

http://reviews.llvm.org/D8136

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/