[llvm] r180802 - InstCombine: Fold more shuffles of shuffles.

Nadav Rotem nrotem at apple.com
Tue Apr 30 21:29:55 PDT 2013


> 
> I don't disagree with reverting the change in this case, as there is disagreement regarding the correct approach. Nevertheless, I think that something about the objections to the change seems misguided: it seems that we've traded a demonstrated performance improvement on at least one platform for a theoretical lack-of-regression on another. If there are cases on x86 where we need to preserve carefully-constructed shuffle combinations, then we need to have examples of these as regression tests, and we need examples in the test suite. Otherwise, we have no real way to insure that we won't regress on those kernels (which likely rely on other factors besides the shuffle selections). Maybe MI-level CSE will be sufficient to cleanup after a DAGCombine-level shuffle combination, but I suspect we'll get better results from doing this optimization earlier.

Hi Hal, 

Reverting this patch was the right thing to do.  At the end of the day, we can solve this with a simple DAGCombine optimization that merges concat_vector and extract_subvector. Notice that all of the shuffles that come out of SROA are legal and they are converted into the insert/extract/concat_vector SDNodes.  We can continue the discussion on the shuffle optimization, but it is irrelevant to this PR. Regarding the changes to InstCombine, the regressions on x86 are not theoretical and I mentioned a specific example[1] (AOS to SOA transpose where merging shuffles will hurt the performance due to cross-lane shuffles. In theory the codegen could recover, but there are a number of problems. One problem is that the search space is huge. Another problem is that we lower bottom-up and we don't have a good way to take into account the number of users that different shuffle nodes have.  InstCombine needs to canonicalize (not to optimize) only in cases where it makes sense. 

Thanks,
Nadav



[1] - http://software.intel.com/en-us/articles/3d-vector-normalization-using-256-bit-intel-advanced-vector-extensions-intel-avx


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130430/59f90f19/attachment.html>


More information about the llvm-commits mailing list