RFC: Enable vectorization of call instructions in the loop vectorizer

Thu Feb 26 13:49:45 PST 2015

Hi Michael,

thanks for following up on this. I think it is important to move forward with this.

> On Feb 26, 2015, at 12:01 PM, Michael Zolotukhin <mzolotukhin at apple.com> wrote:
> 
> Hi,
> 
> Some time ago James posted a patch for vectorization of calls [1]. Though it seems to be a missing feature in the vectorizer, it wasn’t committed, so I decided to revive the discussion.
> 
> To begin with, I rebased the patch and split it to several semantically separated parts. The approach used here is pretty natural: TargetLibraryInfo has a mapping for a set of scalar functions to their vector counterparts, TargetTransformInfo consults with it when cost of call instruction is calculated, if the vector-version is cheaper, vectorizer emits vector-function.
> 
> There are several question on this patch:
> 1. Is this approach still viable in general and worth pursuing?

I believe this is still important. We want to be able to vectorize function calls if vectorized versions of math library functions are available.

> 2. The cost model is changed. In theory it should become more accurate, but I haven’t measured performance impact of this change yet. Does these changes look good in general? If yes, do we need to tune some target-cost-model parameters, or those are the values that no one has been actually using yet?

I think we will have to test performance impact of this change since we are most likely the first clients. We should test with/without the target tuple that enables the vectorized function calls. I think if we don’t see egregious regressions in the run that uses the modified cost model but is a run without vectorized versions of the math library being available we can commit this patch set such that it gets testing by the community.

> 3. What is the best way to populate the scalar to vector functions map? For now, I populate it when the target-triple contains ‘Accelerate’. If we have a lot of other cases, we might think of something different here.

We use the environment substring of the target tuple (in this case quadruple). This is in spirit to what we are already to conditionally turn on/off features in TargetLibraryInfo. I think we should address the issue of scalability when we have a client where this gets to be an issue.

> 
> Any feedback or ideas are appreciated!
> 
> Thanks,
> Michael
> 
> [1]: http://thread.gmane.org/gmane.comp.compilers.llvm.cvs/169736 <http://thread.gmane.org/gmane.comp.compilers.llvm.cvs/169736>
> 
> 
> <0001-Make-ToVectorTy-static.patch>
> <0002-Fix-a-copy-paste-bug.patch>
> <0003-TLI-Use-lambda.-NFC.patch>
> <0004-LoopVectorizer-Add-TargetTransformInfo.patch>
> <0005-TLI-Factor-out-sanitizeFunctionName.-NFC.patch>
> <0006-TLI-Add-interface-for-querying-whether-a-function-is.patch>
> <0007-TTI-Add-getCallInstrCost.patch>
> <0008-LoopVectorize-teach-loop-vectorizer-to-vectorize-cal.patch>
> <0009-TTI-Honour-cost-model-for-estimating-cost-of-vector-.patch>
> <0010-TLI-Add-several-entries-to-VectorizableFunctions-tab.patch>

Some comments on the code:

This function could use some documentation. What are Tys? (Argument types).

+  unsigned getCallInstrCost(Function *F, Type *RetTy, ArrayRef<Type *> Tys,
+                            const TargetLibraryInfo *TLI) {
+
+    // Scalar function calls are always expensive.
+    if (!RetTy->isVectorTy())
+      return 10;
+
+    // Functions with a vector form are no more expensive than a scalar call.
+    if (TLI &&
+        TLI->isFunctionVectorizable(F->getName(),
+                                    RetTy->getVectorNumElements()))
+      return 10;
+
+    // We have to scalarize this function call. Estimate the cost.
+    unsigned ScalarizationCost = getScalarizationOverhead(RetTy, true, false);
+    unsigned ScalarCalls = RetTy->getVectorNumElements();
+    for (unsigned i = 0, ie = Tys.size(); i != ie; ++i) {
+      if (Tys[i]->isVectorTy()) {
+        ScalarizationCost += getScalarizationOverhead(Tys[i], false, true);

This is not obvious. Why could we have more elements in the vector type of a function arguments than the vector return type when vectorizing a function such that we need more function calls when scalarizing?

+        ScalarCalls = std::max(ScalarCalls, Tys[i]->getVectorNumElements());
+      }
+    }
+
+    return ScalarCalls * 10 + ScalarizationCost;
+  }

There seems to be some repetition in the code below. Can we refactor that?

+        default:
+          bool HasScalarOpd = hasVectorInstrinsicScalarOpd(ID, 1);
+          for (unsigned Part = 0; Part < UF; ++Part) {
+            SmallVector<Value *, 4> Args;
+            for (unsigned i = 0, ie = CI->getNumArgOperands(); i != ie; ++i) {
+              if (HasScalarOpd && i == 1) {
+                Args.push_back(CI->getArgOperand(i));
+                continue;
+              }
+              VectorParts &Arg = getVectorValue(CI->getArgOperand(i));
+              Args.push_back(Arg[Part]);
+            }
+            Type *Tys[] = {CI->getType()};
+            if (VF > 1)
+              Tys[0] = VectorType::get(CI->getType()->getScalarType(), VF);
+
+            Function *F = Intrinsic::getDeclaration(M, ID, Tys);
+            Entry[Part] = Builder.CreateCall(F, Args);
+          }
+
+          propagateMetadata(Entry, it);
+          break;
+        }
+        // Otherwise, perform the lib call.
+      } else if (TLI && TLI->isFunctionVectorizable(FnName, VF)) {
+        // This is a function with a vector form.
+        StringRef VFnName = TLI->getVectorizedFunction(FnName, VF);
+        assert(!VFnName.empty());
+
+        Function *VectorF = M->getFunction(VFnName);
+        if (!VectorF) {
+          // Generate a declaration
+          FunctionType *FTy = FunctionType::get(RetTy, Tys, false);
+          VectorF =
+              Function::Create(FTy, Function::ExternalLinkage, VFnName, M);
+          assert(VectorF);
+        }
+
         for (unsigned Part = 0; Part < UF; ++Part) {
           SmallVector<Value *, 4> Args;
           for (unsigned i = 0, ie = CI->getNumArgOperands(); i != ie; ++i) {
-            if (HasScalarOpd && i == 1) {
-              Args.push_back(CI->getArgOperand(i));
-              continue;
-            }
             VectorParts &Arg = getVectorValue(CI->getArgOperand(i));
             Args.push_back(Arg[Part]);
           }
-          Type *Tys[] = {CI->getType()};
-          if (VF > 1)
-            Tys[0] = VectorType::get(CI->getType()->getScalarType(), VF);
-
-          Function *F = Intrinsic::getDeclaration(M, ID, Tys);
-          Entry[Part] = Builder.CreateCall(F, Args);
+          Entry[Part] = Builder.CreateCall(VectorF, Args);
         }

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150226/5b9481bb/attachment.html>