[PATCH] Add a Scalarize pass

Sun Nov 10 10:06:23 PST 2013

Hi Pekka and Rinato, 

The proposed "scalarizer" pass is only useful for domain specific languages - as part of a non traditional optimization pipe.  For example, scalarization in LLVM IR allows re-vectorization in OpenCL. However, the current implementation of the pass is not very useful for OpenCL because it does not scalarize function calls (such as DOT). But maybe someone can add this missing functionality one day. I am in favor of adding it to LLVM because LLVM is a collection of modular and reusable compiler libraries. I am worried about the increase in code size for people who don’t care about this functionality, but that’s a different story. 

The scalarizer pass is not useful for the traditional optimization pipe because the LLVM codegen can already scalarize vectors. It happens automatically for targets that don’t support vectors. The vectorizer will not generate new vector instructions for processors with no vector instructions. People who use intrinsics or inline assembly are responsible for their optimizations and the compiler is not expected to save them when they port their code to targets that don’t support SIMD. 

Some people use LLVM as a portable bytecode (SPIR, PNaCL).  Vectorization is not something that should be done without target information, just like other lowering phases in the compiler (like LSR, CGP, legalization).

Thanks,
Nadav 

On Nov 10, 2013, at 4:28 AM, Pekka Jääskeläinen <pekka.jaaskelainen at gmail.com> wrote:

> 
>  Hi,
> 
>  It doesn't have to be a "domain specific language" for one to want to use vector datatypes/intrinsics.
>  One point of view is that if the programmer has used them, he/she has done a target-specific optimization, assuming it is profitable to use those -- and it might be so for the original target at hand. Or the vector datatypes might be natural for the algorithm/problem at hand but not map optimally to the platform at hand.
> 
>  To port the performance of that code to a machine with different SIMD or other fine grained parallel hardware (VLIW/superscalar), it might be profitable to first undo the user's vector code to map the code better to the parallel resources of the target at hand. In that case, the idea is to scalarize the explicit vector intrinsics and then revectorize with the target specific properties.
> 
>  In the current set of LLVM passes, I think this could be done selectively from the vectorizers: if they see that the programmer's vectorization decisions render the code less vectorizable as a whole (e.g. if the loop vectorizer cannot horizontally vectorize the loops just because of it, or if the local vectorizers could find better pairings from scalar code), they can try to scalarize the code first and then vectorize more efficiently for the target at hand.
> 
>  I agree this should not be on by default. I see it most beneficial when done selectively based on both the code and the target at hand. But it should not be left as a "backend pass" either because it might help the vectorizers.
> 
>  BR,
>  Pekka
> 
> http://llvm-reviews.chandlerc.com/D2112