[llvm] r211888 - [x86] Begin a significant overhaul of how vector lowering is done in the

Tue Jul 22 10:04:38 PDT 2014

On Mon, Jul 21, 2014 at 2:18 AM, Chandler Carruth <chandlerc at gmail.com>
wrote:

> On Mon, Jul 7, 2014 at 10:41 AM, Evan Cheng <evan.cheng at apple.com> wrote:
>
>> Have you considered using / extending perfect shuffle to replace some of
>> the logic?
>>
>
> Sorry for not replying promptly here.
>
> I did think about perfect shuffles, but I don't think they're going to
> help much. Fundamentally, perfect shuffle tables don't scale well enough to
> be usable. For just 8 lanes, we're talking about nearly 7 billion entries.
> Even assuming a *bunch* of folding through symmetry and other tricks, we're
> not going to get it to a reasonable.
>

This may be something to punt over to the "academia" side of the fence (I
think Chris may have already done this in one of his talks?). I would be
pretty surprised if there wasn't some fairly straightforward formal model
for the problem with an efficient solution. It's basically an optimization
problem on compositions of permutations, and permutations have a fairly
nice structure which seems like it could be easily exploited.

> So we can only use perfect shuffle tables for 4 lanes and smaller. But for
> 4 lanes and smaller x86 has essentially perfect shuffle *instructions* and
> all the tricky parts are balancing blend vs. shuffle operations and the
> potential for domain crossing penalties.
>

Can you clarify what you mean by "perfect shuffle *instructions*"? I
thought the whole point of PerfectShuffle is precisely to choose "cheaper"
less-general shuffles when possible and better than using a full permute
instruction.

-- Sean Silva

> Those seem reasonably handled by basic code logic and DAG combines rather
> than table-driven approaches.
>
> The other thing that I realized that led me down this path is that there
> is a very fundamental logic to the shuffle instructions on the
> architecture, and the best way to lower shuffle operations is to actually
> follow that logic itself. That leads to the decomposition structure of the
> code here.
>
> -Chandler
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140722/63107fc7/attachment.html>