[llvm] r180802 - InstCombine: Fold more shuffles of shuffles.

Tue Apr 30 17:26:33 PDT 2013

Note, I reverted the original change in r180834 since we’re actively discussing the right approach.

-j
On Apr 30, 2013, at 4:48 PM, Jim Grosbach <grosbach at apple.com> wrote:

> 
> On Apr 30, 2013, at 4:07 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> 
>> ----- Original Message -----
>>> From: "Jim Grosbach" <grosbach at apple.com>
>>> To: "Nadav Rotem" <nrotem at apple.com>
>>> Cc: llvm-commits at cs.uiuc.edu, "Hal Finkel" <hfinkel at anl.gov>
>>> Sent: Tuesday, April 30, 2013 5:01:26 PM
>>> Subject: Re: [llvm] r180802 - InstCombine: Fold more shuffles of shuffles.
>>> 
>>> 
>>> 
>>> 
>>> On Apr 30, 2013, at 2:38 PM, Nadav Rotem < nrotem at apple.com > wrote:
>>> 
>>> 
>>> 
>>> 
>>> On Apr 30, 2013, at 2:23 PM, Jim Grosbach < grosbach at apple.com >
>>> wrote:
>>> 
>>> 
>>> 
>>> That used to be true. These days, we see lots of shuffles coming from
>>> SROA and we have to mess with them or things go very badly.
>>> 
>>> 
>>> You can change SROA to reduce the number of shuffle vectors. I
>>> scanned SROA and I did not see anything too bad. I only saw
>>> insert/extract and blend, which should map nicely to any vector ISA.
>>> But you can make things better by changing SROA.cpp::insertVector to
>>> generate a select instruction, instead of a vector shuffle.
>>> 
>>> It’s more complicated than that. Have a look at the example on the PR
>>> that motives this patch to see what I mean. If you have an
>>> alternative suggestion, I’d love to hear it.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> The more I think about it, the more I disagree with the current
>>> design. It creates a very deep tight coupling between what
>>> InstCombine does and the least-common-denominator of what target
>>> codegen can handle. That’s untenable in the long term and
>>> artificially restrictive.
>>> 
>>> 
>>> The motivation for the current design (which I am happy with) is that
>>> it is very difficult to undo a canonicalization of shuffles because
>>> the search space is huge. Also, the clang shuffle builtins are used
>>> by people who optimize their performance sensitive applications.
>>> 
>>> 
>>> 
>>> 
>>> This code is mostly pretty ancient. I wouldn’t read very much into
>>> the current implementation with regards to motivations. A lot has
>>> changed in the last few years. In any case, because something is
>>> difficult doesn’t mean it’s not the right thing to do. Don’t get me
>>> wrong, I’m not un-sympathetic to being pragmatic here. We have to
>>> be. There’s always a balance between the theoretically best approach
>>> and what we do in practice out of expedience. In this case, however,
>>> we’re aggressively pessimizing one target to help out another.
>> 
>> It is more than one target being hurt ;) -- but the current pessimizing, from what I can tell, is designed to protect against two things:
>> 
>> 1. Creating shuffles that are expensive to materialize (either because they are not naively supported, or because they are just slower (because of this AVX issue, for example). We have a TTI interface for shuffle costs, maybe we should use it?
> 
> Right. It’s effectively taking a least-common-denominator approach and only doing a very few things that it knows are always safe, for some definitions of “always.” The current change is obviously too liberal and should be revisited. The question is whether there is a sufficiently conservative transformation we can/should make here that addresses the original problem or whether we should fold these shuffles away somewhere else. I’m on the fence. I like the idea, in principal, of doing it here. I’m unsure if that’s best in practice. For example, checks that we’re really doing subvector operations here rather that a generic shuffle. That’s complicated by not having done type legalization yet, so the whole idea of what shuffles will be “safe” is really hard.
> 
> TTI is a possibility. Though despite the FIXME I added, I’m somewhat loathe to do that. I want to keep target-dependent logic out of the IR level passes as much as possible. In particular ones, like InstCombine, that are typically run as part of ‘opt’ rather than ‘llc’.
> 
> Nadav and I looked at this a bit more and are considering a DAGCombine instead. While we don’t have as much of the existing logic to leverage there, we do have a lot more information about the target available to us. Tradeoffs abound.
> 
> 
>> 
>> 2. Creating too many different shuffles. On architectures that support arbitrary shuffles with relatively low cost, arbitrary shuffles generally require a shuffle-specification input in a register, and too many unique shuffles therefore increases register pressure. It is not clear how much of a problem this is in practice, but I recall a comment in InstCombine somewhere on this point.
> 
> That makes sense. In a perfect world, I’d like to see a target transform to clean that sort of thing up. We’re pretty far from there right now, though.
> 
>> 
>> -Hal
>> 
>>> This
>>> patch flips that around for which target gets pessimized. My
>>> contention is that we can do better than the current extremely
>>> conservative heuristic and fix that.
>>> 
>>> 
>>> We obviously shouldn’t immediately remove all the “be careful here or
>>> targets will probably freak out” safeguards. That would have nasty
>>> results, to put it mildly. We should, however, be actively working
>>> to find ways to allow this (and other) code to be as aggressive as
>>> possible. The question for the moment is what additional conditions
>>> can we add that will allow the transformation for the situation we
>>> want to enable and still avoid it when it’s troublesome.
>>> Suggestions?
>>> 
>>> 
>>> -Jim
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> We use this approach in other places in the compiler. We don't always
>>> canonicalize all the way and in many cases we prefer not to touch
>>> the program unless we are certain that the change is beneficial. The
>>> shuffle ISA is so sparse that merging shuffles are likely to cause
>>> regressions.
>>> 
>>> 
>>> 
>>> We do. And it’s a compromise every single time. It’s technical debt,
>>> and we should be actively working to reduce it, not add to it.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> When I looked at the test case I noticed that some of your shuffles
>>> were actually vector blends. It would make sense to convert
>>> blend-shuffles into select instructions with a constant mask.
>>> 
>>> 
>>> 
>>> 
>>> Sounds like a good idea. I don’t know much about the provenance of
>>> the original IR of these test cases.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130430/98f82b45/attachment.html>