[llvm] r180802 - InstCombine: Fold more shuffles of shuffles.

Tue Apr 30 16:07:02 PDT 2013

----- Original Message -----
> From: "Jim Grosbach" <grosbach at apple.com>
> To: "Nadav Rotem" <nrotem at apple.com>
> Cc: llvm-commits at cs.uiuc.edu, "Hal Finkel" <hfinkel at anl.gov>
> Sent: Tuesday, April 30, 2013 5:01:26 PM
> Subject: Re: [llvm] r180802 - InstCombine: Fold more shuffles of shuffles.
> 
> 
> 
> 
> On Apr 30, 2013, at 2:38 PM, Nadav Rotem < nrotem at apple.com > wrote:
> 
> 
> 
> 
> On Apr 30, 2013, at 2:23 PM, Jim Grosbach < grosbach at apple.com >
> wrote:
> 
> 
> 
> That used to be true. These days, we see lots of shuffles coming from
> SROA and we have to mess with them or things go very badly.
> 
> 
> You can change SROA to reduce the number of shuffle vectors. I
> scanned SROA and I did not see anything too bad. I only saw
> insert/extract and blend, which should map nicely to any vector ISA.
> But you can make things better by changing SROA.cpp::insertVector to
> generate a select instruction, instead of a vector shuffle.
> 
> It’s more complicated than that. Have a look at the example on the PR
> that motives this patch to see what I mean. If you have an
> alternative suggestion, I’d love to hear it.
> 
> 
> 
> 
> 
> 
> 
> The more I think about it, the more I disagree with the current
> design. It creates a very deep tight coupling between what
> InstCombine does and the least-common-denominator of what target
> codegen can handle. That’s untenable in the long term and
> artificially restrictive.
> 
> 
> The motivation for the current design (which I am happy with) is that
> it is very difficult to undo a canonicalization of shuffles because
> the search space is huge. Also, the clang shuffle builtins are used
> by people who optimize their performance sensitive applications.
> 
> 
> 
> 
> This code is mostly pretty ancient. I wouldn’t read very much into
> the current implementation with regards to motivations. A lot has
> changed in the last few years. In any case, because something is
> difficult doesn’t mean it’s not the right thing to do. Don’t get me
> wrong, I’m not un-sympathetic to being pragmatic here. We have to
> be. There’s always a balance between the theoretically best approach
> and what we do in practice out of expedience. In this case, however,
> we’re aggressively pessimizing one target to help out another.

It is more than one target being hurt ;) -- but the current pessimizing, from what I can tell, is designed to protect against two things:

 1. Creating shuffles that are expensive to materialize (either because they are not naively supported, or because they are just slower (because of this AVX issue, for example). We have a TTI interface for shuffle costs, maybe we should use it?

 2. Creating too many different shuffles. On architectures that support arbitrary shuffles with relatively low cost, arbitrary shuffles generally require a shuffle-specification input in a register, and too many unique shuffles therefore increases register pressure. It is not clear how much of a problem this is in practice, but I recall a comment in InstCombine somewhere on this point.

 -Hal

> This
> patch flips that around for which target gets pessimized. My
> contention is that we can do better than the current extremely
> conservative heuristic and fix that.
> 
> 
> We obviously shouldn’t immediately remove all the “be careful here or
> targets will probably freak out” safeguards. That would have nasty
> results, to put it mildly. We should, however, be actively working
> to find ways to allow this (and other) code to be as aggressive as
> possible. The question for the moment is what additional conditions
> can we add that will allow the transformation for the situation we
> want to enable and still avoid it when it’s troublesome.
> Suggestions?
> 
> 
> -Jim
> 
> 
> 
> 
> 
> 
> 
> We use this approach in other places in the compiler. We don't always
> canonicalize all the way and in many cases we prefer not to touch
> the program unless we are certain that the change is beneficial. The
> shuffle ISA is so sparse that merging shuffles are likely to cause
> regressions.
> 
> 
> 
> We do. And it’s a compromise every single time. It’s technical debt,
> and we should be actively working to reduce it, not add to it.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> When I looked at the test case I noticed that some of your shuffles
> were actually vector blends. It would make sense to convert
> blend-shuffles into select instructions with a constant mask.
> 
> 
> 
> 
> Sounds like a good idea. I don’t know much about the provenance of
> the original IR of these test cases.
>