[PATCH] Fold selects when the non-selected elements are undef from shufflevectors
Eric Christopher
echristo at gmail.com
Thu May 8 10:51:59 PDT 2014
On Thu, May 8, 2014 at 10:45 AM, Philip Reames
<listmail at philipreames.com> wrote:
>
> On 05/07/2014 12:14 AM, Nick Lewycky wrote:
>>
>> Philip Reames wrote:
>>>
>>> Nadav,
>>>
>>> Could you spell out what you believe to be the right approach w.r.t.
>>> this topic? I'd also appreciate any links to previous discussion on this
>>> topic; a quick google search didn't find much.
>>>
>>> Reading through Filipe's patch, it seems entirely reasonable to me. It's
>>> a strict reduction in the number of shufflevectors in the IR. Even given
>>> the fact we can't lower shuffle vector perfectly in all cases, I don't
>>> see the harm in this patch. I'm not trying to say you're wrong; I'm
>>> merely trying to understand what it is I'm not seeing here.
>>
>>
>> I think the situation is that perfect lowering of arbitrary IR
>> shufflevectors is hard, because the instructions offered by hardware are
>> very particular, because the search space is very large, and because we
>> don't have a good algorithm yet.
>>
>> One previous thread is here:
>> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130429/173544.html
>> and it gets into group theory here:
>> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130429/173601.html
>> . Even this thread mention another thread the previous week.
>>
>> The correct future design for llvm is to let the IR optimizer optimize the
>> shufflevectors however it wants, and leave it to the backend to solve the
>> problem.
>
> Thanks for the background on the general problem, that was quite helpful.
>
> After thinking about it a bit, I don't have a good general solution, but I
> find myself really bothered by the fact that we're avoiding canonicalization
> at the IR level (which might expose other non-trivial optimizations),
> because we're not sure how to lower something for some targets. This seems
> less than desirable.
>
> Do we have a means to determine whether a particular shufflevector is
> "trivially" (for some definition thereof) lowerable? If so, we could use
> that as a profitability check on optimizations like the one Filipe
> originally proposed. (e.g. if the new sufflevector maps to a single
> instruction, it's clearly better than two instructions -- hopefully) Given
> this would be using target information, the optimization might have to be
> moved later in the optimizer.
>
> I'm not sure this particular idea is the right approach, but my thought
> process here is that maybe we can identify some subset of the original
> optimization which clearly *is* profitable, even given our lowering
> restrictions. That seems like a logical path forward if we can easily
> achieve it.
>
It would be nice, this patch by itself gave me about a percent (above
the noise) on a benchmark.
-eric
> Philip
>
>
>
>
>>
>> However, this isn't what we do *today* and that's acceptable today people
>> mostly write vectors in tight kernel loops themselves, or else the
>> autovectorizer picked the exact right sequence for the hardware at hand.
>>
>> In the future, those excuses aren't going to work. Code tuned for one
>> microarch will find itself inlined all over a codebase where it will
>> intersect code that was autovectorized to a different vector width, for
>> instance. We need the optimizer to be able to optimize and let the code
>> generator do the instruction selection.
>>
>> Nick
>>
>>> On 05/06/2014 09:22 AM, Nadav Rotem wrote:
>>>>
>>>> Hi Filipe,
>>>>
>>>> Thank you for working on this. I appreciate the fact that you are
>>>> eager to improve LLVM and that you went through the trouble of testing
>>>> your code on multiple targets and on considering different
>>>> alternatives to the problem. I think that the patch below is unrelated
>>>> to the blend->select lowering so I will only address this patch. We
>>>> discussed on the possibility of creating new shuffles on the llvm
>>>> mailing lists more than once. We decided not to create new shuffles
>>>> because we may create new shuffle patterns that we can’t lower
>>>> efficiently. I ran into this problem a few times myself in different
>>>> LLVM-based projects, mainly in the domain of graphics where people
>>>> write swizzles themselves. Please do not commit this patch because it
>>>> creates new shuffles. It is not the correct approach. You may get
>>>> lucky in you selection of testcase, but I am sure that there are lots
>>>> of other shuffle inputs on X86 and other targets that this patch makes
>>>> worse.
>>>>
>>>> Thanks,
>>>> Nadav
>>>>
>>>>
>>>>> Hi Nadav,
>>>>>
>>>>> I'd really like this patch to get in, since it improves code
>>>>> generation on many targets (at the very least, I've tried arm and
>>>>> arm64 (mcpu=cyclone), ppc32 and ppc64 (mcpu=g5), and x86 and x86-64
>>>>> (mcpu=nehalem) and they all improve (they end up with no loads vs. 1
>>>>> or 2 loads, and code size is about half of what it was).
>>>>>
>>>>> Even if we'd like to make the select instruction be the proper
>>>>> canonicalization of this operation, it seems that, for now, a
>>>>> shufflevector is better when it comes to lowering it.
>>>>>
>>>>> Several targets have the action for the vselect as Expand, which will
>>>>> generate a bunch of code which would then have to be pattern-matched,
>>>>> instead of lowering the vselect as custom and falling-through to
>>>>> Expand if there's no special-case instruction/set of instructions
>>>>> that we can emit.
>>>>>
>>>>> I ran opt+llc in this way, for the variants I mentioned above
>>>>> (./bin/opt is opt with this optimization):
>>>>> ./bin/opt -O3 select-from-shuffles.ll | ./bin/llc -march=arm64
>>>>> -mcpu=cyclone > arm64-cylone-opt
>>>>> opt -O3 select-from-shuffles.ll | ./bin/llc -march=arm64
>>>>> -mcpu=cyclone > arm64-cylone-opt
>>>>>
>>>>> I can understand if we eventually drop this after we make the backend
>>>>> treat the select as the canonical form of this operation and have
>>>>> llvm always use that form, but for now it seems that the
>>>>> shufflevector is better lowered in some important targets.
>>>>>
>>>>>
>>>>> Filipe
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, May 2, 2014 at 11:55 PM, Nadav Rotem <nrotem at apple.com
>>>>> <mailto:nrotem at apple.com>> wrote:
>>>>>
>>>>> Hi Filipe,
>>>>>
>>>>> We don’t generate new shufflevector instructions during
>>>>> optimizations because the lowering of shuffle instructions is
>>>>> really complicated and we don’t want to generate a shuffle that
>>>>> we don’t lower by accident.
>>>>>
>>>>> Thanks,
>>>>> Nadav
>>>>>
>>>>> On May 2, 2014, at 4:34 PM, Filipe Cabecinhas
>>>>> <filcab+llvm.phabricator at gmail.com
>>>>> <mailto:filcab%2Bllvm.phabricator at gmail.com>> wrote:
>>>>>
>>>>> > Remove unused argument
>>>>> >
>>>>> > http://reviews.llvm.org/D3561
>>>>> >
>>>>> > Files:
>>>>> > lib/Transforms/InstCombine/InstCombineSelect.cpp
>>>>> > test/Transforms/InstCombine/select.ll
>>>>> > <D3561.9050.patch>_______________________________________________
>>>>> > llvm-commits mailing list
>>>>> > llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
>>>>> > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
More information about the llvm-commits
mailing list