patch: make instcombine remove shuffles by reordering vector elements

Sun May 5 21:43:17 PDT 2013

On Sun, May 5, 2013 at 7:53 AM, Hal Finkel <hfinkel at anl.gov> wrote:

> ----- Original Message -----
> > From: "Duncan Sands" <duncan.sands at gmail.com>
> > To: llvm-commits at cs.uiuc.edu
> > Sent: Sunday, May 5, 2013 3:47:29 AM
> > Subject: Re: patch: make instcombine remove shuffles by reordering
> vector     elements
> >
> > Hi Anton,
> >
> > On 05/05/13 10:22, Anton Korobeynikov wrote:
> > >>> We lower x86
> > >>> shuffles with 1000 lines of c++ code.
> > >>
> > >> Maybe that's not so bad ;) The PPC has a whole perfect-shuffle
> > >> generation framework to handle these kinds of things for Altivec.
> > >> Have you ever looked at PPCPerfectShuffle.h and
> > >> utils/PerfectShuffle/PerfectShuffle.cpp?
> > > Same on ARM. But everything is only for 4-element shuffle. Doing
> > > same
> > > for 8 element shuffles looks like an impractical task (both in time
> > > and memory requirement for shuffle table).
> > >
> > > We can "cheat" with some clever "8 el shuffle to 4 el shuffle"
> > > lowering pass, but I'm not aware of any.
> > >
> > > And on x86 we have much wider regs...
> >
> > how are the perfect shuffle tables generated?  I'm assuming it is
> > done by,
> > for each shuffle, solving offline an optimization problem where the
> > objective
> > function is based on known characteristics of the processor.  What
> > are those
> > characteristics?
>
> Actually, I think that it works the other way: It runs through all
> combinations of the input permutation instructions (so the table generator
> is just solving all of the forward problems, not the inverse problem, and
> it stores the lowest-cost result for each computed output).
>
> On PPC, however, the situation seems slightly more friendly than on x86
> because we still do have a general-purpose permutation instruction, it is
> just faster (and lowers register pressure) to use an alternative sequence.
>
> > Maybe it is possible to solve the optimization
> > problem, or
> > get a near-to-optimal solution, on the fly with a sufficiently clever
> > algorithm.
>
> It seems like it should be possible. If we restrict the problem so that
> multiple simultaneous intermediate results are not allowed, then this
> certainly seems like a group-theory question. Do you know anyone who is a
> computational group theory expert?
>

Since shuffles that repeat indices are permitted (e.g. [0,1,2,3] ->
[1,1,2,3]), some shuffles don't have inverses, so it's unlikely that the
problem will be very amenable to group theoretic inquiry (unless there is
some clever way to represent the operations that isn't simply how they
rearrange the inputs). Also the fact that e.g. the AVX instruction PSHUFB
can put 0's into the result without being in the source seems like a
difficult obstacle to formalize group theoretically.

>From what I can glean about the problem description from this thread and
Chris's slides, it does sound like it might be amenable to an optimal
dynamic programming solution.

-- Sean Silva
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130505/7c302d85/attachment.html>