<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>I did a bit of brainstorming on this today, and here's a sketch of some ideas:<br></div></div></div><div class="gmail_extra" style><br></div><div class="gmail_extra" style>

Define a "shuffle" as a function S mapping the set {0,...,#of input elements -1} to the set {0,...,#of output elements - 1}. Then we may define a "shuffle class" P to be a set of shuffles. A "shuffle class" roughly corresponds to a single kind of target instruction.</div>

<div class="gmail_extra" style><br></div><div class="gmail_extra" style>Then shuffle classes have a partial order P <= Q established by the subset relation.</div><div class="gmail_extra" style><br></div><div class="gmail_extra" style>

>From this, we can then assign a cost c(P) = n > 0 to each shuffle class. Then we make the following assumption:</div><div class="gmail_extra" style><br></div><div class="gmail_extra" style>Assumption 1: Let P, Q be shuffle classes. Then P <= Q implies that c(P) < c(Q). Intuitively, less general shuffle instructions are cheaper. If this is not the case, then in any practical algorithm we can immediately choose the cheaper one anywhere that we would choose the more expensive one and prune that part of the search space. So we assume that this is the case.</div>

<div class="gmail_extra" style><br></div><div class="gmail_extra" style>It is natural to extend the cost function to compositions of shuffles so that if S_i is an element of a shuffle class P_i, then</div><div class="gmail_extra" style>

c(S1 o ... o Sn) = c(P1) + ... + c(Pn)</div><div class="gmail_extra" style>For left to right readability, we use the "backwards composition" convention, so that (f o g)(x) == g(f(x))</div><div class="gmail_extra" style>

<br></div><div class="gmail_extra" style>Now, from the assumption, it is easy to see that if P <= Q and S_P, S_Q are shuffles in their respective shuffle classes, then</div><div class="gmail_extra" style>c(S1 o ... o S_{n-1} o S_P) < c(S1 o ... o S_{n-1} o S_Q)</div>

<div class="gmail_extra" style><br></div><div class="gmail_extra" style>There is a nice lattice structure on the set of shuffle compositions (roughly, "instruction sequences") imposed by the prefix relation. Also, the above cost function should be monotonic on this lattice since shuffle costs are strictly positive. At the very least, the lattice structure together with a cost function monotonic on the lattice should allow some amount of pruning over a brute force search. A cost function like this also would be amenable to A* search (although I have no clue what heuristic function would be appropriate).</div>

<div class="gmail_extra" style><br></div><div class="gmail_extra" style>That's all I was able to come up with today (while waiting at the DMV...). I still need to think some more about how to impose more structure (than just "black box" functions) on the set of all possible shuffles (and hence on the shuffle classes). Primarily, this structure would enable reasoning about "dead ends" when composing shuffles. For example with AVX, if the desired shuffle does not move elements across 128-bit lanes, then all PSHUFB shuffles that cross lanes are "dead ends" in the search space (this applies more to the problem of computing perfect shuffles online, rather than building tables).</div>

<div class="gmail_extra" style><br></div><div class="gmail_extra" style>Also, a simple integer cost function like I presented above might be inadequate for expressing how "good" a shuffle sequence is.</div><div class="gmail_extra" style>

<br></div><div class="gmail_extra" style>-- Sean Silva</div></div>