[PATCH] D14261: [X86][SSE] Recursive search for zeroable shuffle elements

Simon Pilgrim via llvm-commits llvm-commits at lists.llvm.org
Sun Nov 15 10:04:07 PST 2015


RKSimon added a comment.

In http://reviews.llvm.org/D14261#289722, @chandlerc wrote:

> In http://reviews.llvm.org/D14261#289721, @RKSimon wrote:
>
> > insertps is a curious instruction in that it really takes 3 inputs (well, 2 inputs + zero), which makes combining rather tricky with our existing methods - as another approach I could disable recursion in computeZeroableShuffleElements by default and just have the insertps lowering use it?
>
>
> Ah, I think this is really the critical thing to realize.
>
> Because of the simplification provided by it, we should essentially re-associate shuffles to cause all constant inputs to come from a single build_vector, and that a build_vector of constants should be the last input shuffled:
>
> For example:
>
>   S1 = (shuffle A, (build constant, constant, ...))
>   S2 = (shuffle B, S1)
>   
>
> should combine to:
>
>   S1 = (shuffle A, B)
>   S2 = (shuffle S1, (build constant, constant, ...))
>   
>
> So that we can always identify constant inputs at the bottom of the shuffle tree. We can still indicate *unused* inputs in S1 with undef shuffle indices.
>
> Does this make sense to you? Is there some *other* thing broken by this style of reassociation?


The approach you propose is the canonicalization step I mentioned. My main concern with it is making sure our existing shuffle lowering can be adapted to 'peek' through a bottom shuffle with constants (in this case zeroable vectors) - this will affect insertps and the perm2f128 instructions and possibly others - but its certainly doable, but may involve reordering some of the lowerings so that they are attempted earlier. This will probably affect build_vector lowering as well - but I won't know the scope until I investigate this more.

Do you think this should be limited to the X86? Large parts of the code, such as combining multiple constant build_vector shuffle inputs into one, would almost certainly be useful in the DAGCombiner and I can't think of reasons why pushing down the shuffles with constants would cause any notable regressions in other targets' shuffle codegen.

That would leave this patch without the recursion element - I think reducing it to just allowing it to check for zeroable build_vectors of different element counts would still be useful, I'll see if I can update this patch.

Cheers, Simon.


Repository:
  rL LLVM

http://reviews.llvm.org/D14261





More information about the llvm-commits mailing list