[PATCH] Fold selects when the non-selected elements are undef from shufflevectors

Wed May 7 00:14:35 PDT 2014

Philip Reames wrote:
> Nadav,
>
> Could you spell out what you believe to be the right approach w.r.t.
> this topic? I'd also appreciate any links to previous discussion on this
> topic; a quick google search didn't find much.
>
> Reading through Filipe's patch, it seems entirely reasonable to me. It's
> a strict reduction in the number of shufflevectors in the IR. Even given
> the fact we can't lower shuffle vector perfectly in all cases, I don't
> see the harm in this patch. I'm not trying to say you're wrong; I'm
> merely trying to understand what it is I'm not seeing here.

I think the situation is that perfect lowering of arbitrary IR 
shufflevectors is hard, because the instructions offered by hardware are 
very particular, because the search space is very large, and because we 
don't have a good algorithm yet.

One previous thread is here: 
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130429/173544.html 
and it gets into group theory here: 
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130429/173601.html 
. Even this thread mention another thread the previous week.

The correct future design for llvm is to let the IR optimizer optimize 
the shufflevectors however it wants, and leave it to the backend to 
solve the problem.

However, this isn't what we do *today* and that's acceptable today 
people mostly write vectors in tight kernel loops themselves, or else 
the autovectorizer picked the exact right sequence for the hardware at hand.

In the future, those excuses aren't going to work. Code tuned for one 
microarch will find itself inlined all over a codebase where it will 
intersect code that was autovectorized to a different vector width, for 
instance. We need the optimizer to be able to optimize and let the code 
generator do the instruction selection.

Nick

> On 05/06/2014 09:22 AM, Nadav Rotem wrote:
>> Hi Filipe,
>>
>> Thank you for working on this. I appreciate the fact that you are
>> eager to improve LLVM and that you went through the trouble of testing
>> your code on multiple targets and on considering different
>> alternatives to the problem. I think that the patch below is unrelated
>> to the blend->select lowering so I will only address this patch. We
>> discussed on the possibility of creating new shuffles on the llvm
>> mailing lists more than once. We decided not to create new shuffles
>> because we may create new shuffle patterns that we can’t lower
>> efficiently. I ran into this problem a few times myself in different
>> LLVM-based projects, mainly in the domain of graphics where people
>> write swizzles themselves. Please do not commit this patch because it
>> creates new shuffles. It is not the correct approach. You may get
>> lucky in you selection of testcase, but I am sure that there are lots
>> of other shuffle inputs on X86 and other targets that this patch makes
>> worse.
>>
>> Thanks,
>> Nadav
>>
>>
>>> Hi Nadav,
>>>
>>> I'd really like this patch to get in, since it improves code
>>> generation on many targets (at the very least, I've tried arm and
>>> arm64 (mcpu=cyclone), ppc32 and ppc64 (mcpu=g5), and x86 and x86-64
>>> (mcpu=nehalem) and they all improve (they end up with no loads vs. 1
>>> or 2 loads, and code size is about half of what it was).
>>>
>>> Even if we'd like to make the select instruction be the proper
>>> canonicalization of this operation, it seems that, for now, a
>>> shufflevector is better when it comes to lowering it.
>>>
>>> Several targets have the action for the vselect as Expand, which will
>>> generate a bunch of code which would then have to be pattern-matched,
>>> instead of lowering the vselect as custom and falling-through to
>>> Expand if there's no special-case instruction/set of instructions
>>> that we can emit.
>>>
>>> I ran opt+llc in this way, for the variants I mentioned above
>>> (./bin/opt is opt with this optimization):
>>> ./bin/opt -O3 select-from-shuffles.ll | ./bin/llc -march=arm64
>>> -mcpu=cyclone > arm64-cylone-opt
>>> opt -O3 select-from-shuffles.ll | ./bin/llc -march=arm64
>>> -mcpu=cyclone > arm64-cylone-opt
>>>
>>> I can understand if we eventually drop this after we make the backend
>>> treat the select as the canonical form of this operation and have
>>> llvm always use that form, but for now it seems that the
>>> shufflevector is better lowered in some important targets.
>>>
>>>
>>> Filipe
>>>
>>>
>>>
>>>
>>> On Fri, May 2, 2014 at 11:55 PM, Nadav Rotem <nrotem at apple.com
>>> <mailto:nrotem at apple.com>> wrote:
>>>
>>>     Hi Filipe,
>>>
>>>     We don’t generate new shufflevector instructions during
>>>     optimizations because the lowering of shuffle instructions is
>>>     really complicated and we don’t want to generate a shuffle that
>>>     we don’t lower by accident.
>>>
>>>     Thanks,
>>>     Nadav
>>>
>>>     On May 2, 2014, at 4:34 PM, Filipe Cabecinhas
>>>     <filcab+llvm.phabricator at gmail.com
>>>     <mailto:filcab%2Bllvm.phabricator at gmail.com>> wrote:
>>>
>>>     > Remove unused argument
>>>     >
>>>     > http://reviews.llvm.org/D3561
>>>     >
>>>     > Files:
>>>     > lib/Transforms/InstCombine/InstCombineSelect.cpp
>>>     > test/Transforms/InstCombine/select.ll
>>>     > <D3561.9050.patch>_______________________________________________
>>>     > llvm-commits mailing list
>>>     > llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
>>>     > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>
>>>
>>
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu  <mailto:llvm-commits at cs.uiuc.edu>
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits