[PATCH] D19544: Pass for translating math intrinsics to math library calls.

Thu Apr 28 17:31:19 PDT 2016

Thanks for the feedback, Hal. We need the clang support because we actually have several variants of each svml function, each of them having varying levels of precision. To be able to pick the right variant, not only do we need to know the variant name and vector length, we also need the additional precision information specified by the user via -imf flags from the command line. If we go the route of implementing the translation using the addVectorizableFunctionsFromVecLib(), it seems some additional information about precision requirements would be needed in the VecDesc struct. One thing I'm concerned about with this approach is that if the math calls are translated immediately to library calls, then any subsequent optimizations are probably in a less advantageous position of being able to optimize further. Thus, one of the design goals of this project was to keep the vectorized intrinsics as late as possible before translation. What are your thoughts on this?

I apologize for throwing such a large patch at you guys. I was afraid that might be the case. I'll try and keep the changes to a minimum going forward.

Thanks,

Matt

-----Original Message-----
From: Hal Finkel [mailto:hfinkel at anl.gov] 
Sent: Tuesday, April 26, 2016 7:39 PM
To: Masten, Matt; mzolotukhin at apple.com; spatel at rotateright.com; hfinkel at anl.gov
Cc: mehdi.amini at apple.com; llvm-commits at lists.llvm.org
Subject: Re: [PATCH] D19544: Pass for translating math intrinsics to math library calls.

hfinkel added a comment.

Thanks for contributing this! I think we're going to need to work on splitting this into a smaller set of changes that can be integrated with our current infrastructure. I recommend that we start by adding support for the vector math library functions themselves.

> In addition to the -ffast-math flag, support is needed from clang to allow the user to be able to specify the desired precision requirements.

Why? Is this just because the vector math calls don't set errno, or are even the "high accuracy" versions not as precise as the libm implementations?

Here's a simple recipe:

1. Add a new enum value for IMF to VectorLibrary in include/llvm/Analysis/TargetLibraryInfo.h
2. Add the required code to TargetLibraryInfoImpl::addVectorizableFunctionsFromVecLib -- it will look something like this:

(at first, only one variant, maybe you want to pick the _ha versions -- it would be reasonable to add a FastMath flag to addVectorizableFunctionsFromVecLib so that other variants can be selected)

  case IMF: {
    const VecDesc VecFuncs[] = {
        {"sinf", "__svml_sinf4", 4},
        {"llvm.sin.f32", "__svml_sinf4", 4},
        {"cosf", "__svml_cosf4", 4},
        {"llvm.cos.f32", "__svml_cosf4", 4},
        {"powf", "__svml_powf4", 4},
        {"llvm.pow.f32", "__svml_powf4", 4},
        {"__powf_finite", "__svml_powf4", 4},
        {"logf", "__svml_logf4", 4},
        {"llvm.log.f32", "__svml_logf4", 4},
        {"__logf_finite", "__svml_logf4", 4},
        {"expf", "__svml_expf4", 4},
        {"llvm.exp.f32", "__svml_expf4", 4},
        {"__expf_finite", "__svml_expf4", 4},
    ...    
  } 

Then you can make the minimal Clang change (in a follow-up patch), and we'll be able to start:

1. In include/clang/Frontend/CodeGenOptions.def, change `ENUM_CODEGENOPT(VecLib, VectorLibrary, 1, NoLibrary)` to `ENUM_CODEGENOPT(VecLib, VectorLibrary, 2, NoLibrary)` (i.e. increase the number of bits) 2. In include/clang/Frontend/CodeGenOptions.h, add a corresponding enum to enum VectorLibrary 3. In lib/CodeGen/BackendUtil.cpp, add corresponding code to call addVectorizableFunctionsFromVecLib in createTLII 4. In lib/Frontend/CompilerInvocation.cpp, add corresponding code to ParseCodeGenArgs

Once we have that done, we can start working on improving the infrastructure understand more variants, generate better sincos calls, have system-independent constant folding for the various functions, etc. Some of these things should probably be directly integrated into the vectorizer(s) (via reusable utilities), so that the cost model can be directly reasoned about.

Once the implementations of the functions themselves have been contributed, we can look at setting a default vector math function library as well.

In practice, we're going to end up replaying all of your design decisions in slow motion as part of this process. Breaking this into smaller pieces and steps is the only way to do this. And that's the only way we'll be able to give sufficient attention to all of the details. Nevertheless, thanks for posting this entire patch. It does help to give context on where you'd like to go with this.

http://reviews.llvm.org/D19544