[PATCH] Add a Scalarize pass

Wed Nov 13 08:56:21 PST 2013

Hi Richard, 

Thanks for working on this. We should probably move this discussion to llvm-dev because it is not strictly related to the patch review anymore. The code below is not representative of general c/c++ code. Usually only domain specific language (such as OpenCL) contain vector instructions.  The LLVM pass manager configuration (pass manager builder) is designed for C/C++ compilers, not for DSLs.  People who use LLVM for other compilation flows (such as GPU compilers, other languages) create their own optimization pipe. I am in favor of adding the scalarizer pass so that people who build LLVM-based JITs and compilers could use it.  However, I am against adding this pass by default to the pass manager builder.  I understand that there are cases where scalarizing early in the pipeline is better, but I don’t think that its worth the added complexity. Every target has a different set of quirks and we try very hard to avoid adding target-specific passes at IR-level. SelectionDAG is not going away soon, and the SD replacement will also have a scalarizing pass - the overall architecture is not going to change. There are always optimization phase ordering problems in the compiler and at the end of the day we need to come up with an optimization pipe and works for most programs that we care about. I still think that scalarizing in SD is a reasonable solution for c/c++. 

Thanks,
Nadav

On Nov 13, 2013, at 2:03 AM, Richard Sandiford <rsandifo at linux.vnet.ibm.com> wrote:

> Nadav Rotem <nrotem at apple.com> writes:
>> I think that it is a good idea to have a scalarizer pass for people who
>> want to build llvm-based compilers, but I don’t think that this pass
>> should be a part of the default pass manager.  Targets that want to
>> scalarize the code should do it as part of instruction-selection (just
>> declare the types as illegal).  Why do you want to control scalatization
>> from the target ?  IMHO scalarization is only useful in undoing domain
>> specific input IR.
> 
> The problem is that instruction selection is so late that the scalar
> operations don't get optimised very much.  The only pass that runs after
> type legalisation and still understands the function at an operational
> level is DAGCombiner, which is only block-local.
> 
> Take for example something like:
> 
>  typedef unsigned int V4SI __attribute__ ((vector_size (16)));
>  void foo (V4SI *vec, unsigned int n, unsigned int x)
>  {
>    V4SI factor = { x, 2, 4, 8 };
>    for (unsigned i = 0; i < n; ++i)
>      vec[i] *= factor;
>  }
> 
> Without the Scalarizer pass, this multiplication remains a vector
> multiplication between variables until after type legalisation.
> It is then split into four scalar multiplications between variables,
> which we select as multiplications rather than shifts.  With the
> Scalarizer pass, we get one multiplication and three shifts.
> 
> You could argue that in this case, the target-specific CodeGen code
> should be prepared to rewrite multiplications as shifts as a result
> of later (CodeGen) constant propagation, but that isn't as easy for
> more complicated chains of operations.
> 
> This wasn't a motivation, but: I believe there's a long-term plan
> to move away from SelectionDAG-based instruction selection.  I was
> hoping that doing scalarisation at the IR level would help with that.
>