[llvm] r211888 - [x86] Begin a significant overhaul of how vector lowering is done in the
chandlerc at gmail.com
Mon Jul 21 01:18:57 PDT 2014
On Mon, Jul 7, 2014 at 10:41 AM, Evan Cheng <evan.cheng at apple.com> wrote:
> Have you considered using / extending perfect shuffle to replace some of
> the logic?
Sorry for not replying promptly here.
I did think about perfect shuffles, but I don't think they're going to help
much. Fundamentally, perfect shuffle tables don't scale well enough to be
usable. For just 8 lanes, we're talking about nearly 7 billion entries.
Even assuming a *bunch* of folding through symmetry and other tricks, we're
not going to get it to a reasonable. So we can only use perfect shuffle
tables for 4 lanes and smaller. But for 4 lanes and smaller x86 has
essentially perfect shuffle *instructions* and all the tricky parts are
balancing blend vs. shuffle operations and the potential for domain
crossing penalties. Those seem reasonably handled by basic code logic and
DAG combines rather than table-driven approaches.
The other thing that I realized that led me down this path is that there is
a very fundamental logic to the shuffle instructions on the architecture,
and the best way to lower shuffle operations is to actually follow that
logic itself. That leads to the decomposition structure of the code here.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-commits