patch: make instcombine remove shuffles by reordering vector elements

Sun May 5 06:53:43 PDT 2013

----- Original Message -----
> From: "Duncan Sands" <duncan.sands at gmail.com>
> To: llvm-commits at cs.uiuc.edu
> Sent: Sunday, May 5, 2013 3:47:29 AM
> Subject: Re: patch: make instcombine remove shuffles by reordering vector	elements
> 
> Hi Anton,
> 
> On 05/05/13 10:22, Anton Korobeynikov wrote:
> >>> We lower x86
> >>> shuffles with 1000 lines of c++ code.
> >>
> >> Maybe that's not so bad ;) The PPC has a whole perfect-shuffle
> >> generation framework to handle these kinds of things for Altivec.
> >> Have you ever looked at PPCPerfectShuffle.h and
> >> utils/PerfectShuffle/PerfectShuffle.cpp?
> > Same on ARM. But everything is only for 4-element shuffle. Doing
> > same
> > for 8 element shuffles looks like an impractical task (both in time
> > and memory requirement for shuffle table).
> >
> > We can "cheat" with some clever "8 el shuffle to 4 el shuffle"
> > lowering pass, but I'm not aware of any.
> >
> > And on x86 we have much wider regs...
> 
> how are the perfect shuffle tables generated?  I'm assuming it is
> done by,
> for each shuffle, solving offline an optimization problem where the
> objective
> function is based on known characteristics of the processor.  What
> are those
> characteristics? 

Actually, I think that it works the other way: It runs through all combinations of the input permutation instructions (so the table generator is just solving all of the forward problems, not the inverse problem, and it stores the lowest-cost result for each computed output).

On PPC, however, the situation seems slightly more friendly than on x86 because we still do have a general-purpose permutation instruction, it is just faster (and lowers register pressure) to use an alternative sequence.

> Maybe it is possible to solve the optimization
> problem, or
> get a near-to-optimal solution, on the fly with a sufficiently clever
> algorithm.

It seems like it should be possible. If we restrict the problem so that multiple simultaneous intermediate results are not allowed, then this certainly seems like a group-theory question. Do you know anyone who is a computational group theory expert?

 -Hal

> 
> Ciao, Duncan.
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>