[PATCH] D26066: [PowerPC] Improvements for BUILD_VECTOR Vol. 4

Fri Oct 28 14:16:42 PDT 2016

nemanjai added a comment.

In https://reviews.llvm.org/D26066#582413, @amehsan wrote:

> > Yeah, when I was implementing some of the https://reviews.llvm.org/P9 instructions that naturally fit the BUILD_VECTOR nodes, I realized that we produce poor code for some patterns. I wanted to see how bad this situation is so I wrote the C test case that is added as a comment to `build-vector-tests.ll` and examined the assembly. Then I picked off the patterns that produce poor code and fixed them one by one.
>
> Some thoughts about this.
>  (1) How do we know that these pieces of code has any significance? Some are clearly contrived: `(vector unsigned long long) 4.74`. For others I do not think they are going to be common patterns in the user code. It will be more interesting if we know that the compiler is likely to generate a pattern of that kind. How did you make that judgement?

As I mentioned, some patterns are fairly common, some are likely not. The contrived example you mentioned is certainly contrived, but not unheard of. I've seen inline functions that splat scalars into vectors. Passing a double literal to such a function would have that exact form. While I certainly agree that it would be interesting to know what the compiler frequently produces in terms of SD nodes, I don't know that such insight is always worth the time investment to gain. I think that keeping a clearly inferior code generation choice for a pattern when it can easily be improved is fairly hard to justify. An easy improvement to code gen is in my view a better time investment than a much more time-consuming investigation into what the vectorizer or another optimization will produce.

> (2) How did you concluded that the fixes that you are making is in the right place. For example, maybe for some of these patterns we need to change something in InstCombine.  Is there an explanation for the approach that you have taken?

This is an interesting question. There are two things that happen fairly early (i.e. DAG Combine). That's the conversion of a build vector of fp-to-int conversions into an fp-to-int conversion of a build vector and conversion of a build vector of consecutive loads into a wide load. The first is something that the target-independent DAG combine purposely omits. There is a combine of a build vector of int-to-fp nodes into an int-to-fp of a build vector node but not with the conversions going the other way around. So there's likely a good reason not to do this on some targets. My attempt to implement it there also causes failures in the ARM target which means that ARM somehow depends on this not happening. Again, rather than investigating whether this is a win on all targets (and whether ARM has a bug exposed by this), I felt that doing this for PPC where it is clearly a win is a better use of my time. The latter is not something that can happen all that early as proving that the loads are consecutive can only happen later in the pipeline.
The remaining changes are all quite late in the pipeline - lowering. So the argument that perhaps there are cases where these changes will end up eliminating opportunities for some optimization since we're now not expanding a BUILD_VECTOR is not a very strong one because there simply aren't many optimizations after this. All of these changes are meant to allow the SDAG lowering to match patterns in the .td files - which are demonstrably optimal for the node that is lowered into them.

Repository:
  rL LLVM

https://reviews.llvm.org/D26066