patch: make instcombine remove shuffles by reordering vector elements

Mon May 6 16:32:07 PDT 2013

I did a bit of brainstorming on this today, and here's a sketch of some
ideas:

Define a "shuffle" as a function S mapping the set {0,...,#of input
elements -1} to the set {0,...,#of output elements - 1}. Then we may define
a "shuffle class" P to be a set of shuffles. A "shuffle class" roughly
corresponds to a single kind of target instruction.

Then shuffle classes have a partial order P <= Q established by the subset
relation.

>From this, we can then assign a cost c(P) = n > 0 to each shuffle class.
Then we make the following assumption:

Assumption 1: Let P, Q be shuffle classes. Then P <= Q implies that c(P) <
c(Q). Intuitively, less general shuffle instructions are cheaper. If this
is not the case, then in any practical algorithm we can immediately choose
the cheaper one anywhere that we would choose the more expensive one and
prune that part of the search space. So we assume that this is the case.

It is natural to extend the cost function to compositions of shuffles so
that if S_i is an element of a shuffle class P_i, then
c(S1 o ... o Sn) = c(P1) + ... + c(Pn)
For left to right readability, we use the "backwards composition"
convention, so that (f o g)(x) == g(f(x))

Now, from the assumption, it is easy to see that if P <= Q and S_P, S_Q are
shuffles in their respective shuffle classes, then
c(S1 o ... o S_{n-1} o S_P) < c(S1 o ... o S_{n-1} o S_Q)

There is a nice lattice structure on the set of shuffle compositions
(roughly, "instruction sequences") imposed by the prefix relation. Also,
the above cost function should be monotonic on this lattice since shuffle
costs are strictly positive. At the very least, the lattice structure
together with a cost function monotonic on the lattice should allow some
amount of pruning over a brute force search. A cost function like this also
would be amenable to A* search (although I have no clue what heuristic
function would be appropriate).

That's all I was able to come up with today (while waiting at the DMV...).
I still need to think some more about how to impose more structure (than
just "black box" functions) on the set of all possible shuffles (and hence
on the shuffle classes). Primarily, this structure would enable reasoning
about "dead ends" when composing shuffles. For example with AVX, if the
desired shuffle does not move elements across 128-bit lanes, then all
PSHUFB shuffles that cross lanes are "dead ends" in the search space (this
applies more to the problem of computing perfect shuffles online, rather
than building tables).

Also, a simple integer cost function like I presented above might be
inadequate for expressing how "good" a shuffle sequence is.

-- Sean Silva
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130506/ccc78c00/attachment.html>