RFC: Enable vectorization of call instructions in the loop vectorizer

Thu Jan 16 14:51:48 PST 2014

Hi James,

overall I like this patch. Thanks for working on this! There three issues I would like to address:

First, we are going to have the situation where there exists an intrinsic ID for a library function (many math library functions have an intrinsic version: expf -> llvm.exp.32 for example). As a consequence “getIntrinsicIDForCall” will return it. In this case we can have both: a vectorized library function version and an intrinsic function that maybe slower or faster. In such a case the cost model has to decide which one to pick. This means we have to query the cost model which one is cheaper in two places: when get the instruction cost and when we vectorize the call.

Second, the way we test this. I understand that we currently don’t have anyone adding vectorize function calls in tree. However, I really would like not to have to use unit tests to test this feature. How about we use the Environment component (4th) to specify the available library. Say, on MacOSX you have the Accelerate library.

     TLI.setUnavailable(LibFunc::statvfs64);
     TLI.setUnavailable(LibFunc::tmpfile64);
   }
+
+  // Make the vectorized versions available.
+  if (T.getEnvironmentName() == "Accelerate") {
+    const TargetLibraryInfo::VecDesc VecFuncs[] = {
+      { "exp", "vexp", 2},
+      { "expf", "vexpf", 4}
+    };
+    TLI.addVectorizableFunctions(VecFuncs);
+  }
 }

Then, we can test this feature with

target triple = "x86_64-apple-macos-Accelerate"

define void @test(double* %d, double %t) {
  ...
  %1 = tail call double @llvm.exp.f64(double %0)

We will also have to assign a lower cost to function calls of “vexp” in the cost model than for the intrinsic version (in this example). This brings me to issue three. You are currently using TTI->getCallCost() which is not meant to be used with the vectorizer. We should create a getCallInstrCost() function similar to the “getIntrinsicInstrCost” function we already have.

BasicTTI::getCallInstrCost should query TLI->isFunctionVectorizable() and return a sensible value in this case (one that is lower than a scalarized intrinsic lowered as lib call).

Thanks,
Arnold

On Jan 15, 2014, at 11:22 AM, James Molloy <james at jamesmolloy.co.uk> wrote:

> <vectorizer-tli.diff>