[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

Thu Oct 2 13:41:47 PDT 2014

Hi Chandler,

I’ve filed a few PRs regarding the latest regressions I found.

Here are the links if you want the details.
http://llvm.org/bugs/show_bug.cgi?id=21137 <http://llvm.org/bugs/show_bug.cgi?id=21137>
http://llvm.org/bugs/show_bug.cgi?id=21138 <http://llvm.org/bugs/show_bug.cgi?id=21138>
http://llvm.org/bugs/show_bug.cgi?id=21139 <http://llvm.org/bugs/show_bug.cgi?id=21139>
http://llvm.org/bugs/show_bug.cgi?id=21140 <http://llvm.org/bugs/show_bug.cgi?id=21140>

I've already reported the first one a while back.

This is just FYI, I do not expect you to handle all the work :).

Cheers,
-Quentin

> On Oct 1, 2014, at 11:24 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com> wrote:
> 
> Hi Chandler,
> 
> Not sure how important this can be, however I found a minor regression
> with the new shuffle lowering.
> Here is a reproducible test case:
> 
> ;;
> define <4 x i32> @test(<4 x i32> %V) {
>  %1 = shufflevector <4 x i32> %V, <4 x i32> <i32 0, i32 0, i32 0, i32
> 0>, <4 x i32> <i32 0, i32 1, i32 4, i32 5>
>  ret <4 x i32> %1
> }
> ;;
> 
> $ llc -mcpu=corei7-avx -o -
> 
>  vmovq %xmm0, %xmm0
>  retq
> 
> $ llc -mcpu=corei7-avx -x86-experimental-vector-shuffle-lowering -o -
>  vpxor  %xmm1, %xmm1, %xmm1
>  vpunpcklqdq  %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0],xmm1[0]
>  retq
> 
> If we know that the upper 64-bits of the destination register are
> zero, we can try to emit a simpler vmovq instead of a vxor+vunpck.
> 
> As I said, this is a minor issue.
> I just wanted to post this finding so that we don't forget about it.
> 
> Cheers,
> Andrea
> 
> On Wed, Oct 1, 2014 at 9:23 AM, Andrea Di Biagio
> <andrea.dibiagio at gmail.com <mailto:andrea.dibiagio at gmail.com>> wrote:
>> On Wed, Oct 1, 2014 at 1:52 AM, Chandler Carruth <chandlerc at google.com> wrote:
>>> This has been added in r218724.
>> Thanks Chandler!
>> 
>>> Based on the feedback here and from Quentin, I'm going to email the list
>>> shortly with a heads-up, and then flip the default over to the new shuffle
>>> lowering.
>> 
>> Nice.
>> Again, thanks for working on this!
>> 
>> -Andrea
>> 
>>> 
>>> On Mon, Sep 29, 2014 at 10:48 PM, Chandler Carruth <chandlerc at google.com>
>>> wrote:
>>>> 
>>>> Wow. Somehow, I forgot about vbroadcast and vpbroadcast. =[ Sorry about
>>>> that. I'll fix those.
>>>> 
>>>> On Fri, Sep 26, 2014 at 3:39 AM, Andrea Di Biagio
>>>> <andrea.dibiagio at gmail.com> wrote:
>>>>> 
>>>>> Hi Chandler,
>>>>> 
>>>>> Here is another test.
>>>>> 
>>>>> When looking at the AVX codegen, I noticed that, when using the new
>>>>> shuffle lowering, we no longer emit a single vbroadcastss in the case
>>>>> where the shuffle performs a splat of a scalar float loaded from
>>>>> memory.
>>>>> 
>>>>> For example:
>>>>> (with -mcpu=corei7-avx -x86-experimental-vector-shuffle-lowering)
>>>>>   vmovss (%rdi), %xmm0
>>>>>   vpermilps $0, %xmm0, %xmm0 # xmm0 = xmm0[0,0,0,0]
>>>>> 
>>>>> Instead of:
>>>>> (with -mcpu=corei7-avx)
>>>>>  vbroadcastss (%rdi), %xmm0
>>>>> 
>>>>> I have attached a small reproducible for it.
>>>>> 
>>>>> Basically, the old shuffle lowering logic calls function
>>>>> 'NormalizeVectorShuffle' to handle shuffles that perform a splat
>>>>> operation.
>>>>> On AVX, function 'NormalizeVectorShuffle' tries to lower a splat where
>>>>> the splat value comes from a load into a X86ISD::VBROADCAST dag node.
>>>>> Later on, during instruction selection, we emit a single avx_broadcast
>>>>> for the load+splat sequence (basically, we end up folding the load in
>>>>> the operand of the vbroadcastss).
>>>>> 
>>>>> What happens is that the new shuffle lowering doesn't emit a
>>>>> vbroadcast node in this case and eventually we end up selecting the
>>>>> sequence of vmovss+vpermilps.
>>>>> 
>>>>> I hope this helps.
>>>>> Andrea
>>>>> 
>>>>> On Tue, Sep 23, 2014 at 10:53 PM, Chandler Carruth <chandlerc at google.com>
>>>>> wrote:
>>>>>> 
>>>>>> On Tue, Sep 23, 2014 at 2:35 PM, Simon Pilgrim <llvm-dev at redking.me.uk>
>>>>>> wrote:
>>>>>>> 
>>>>>>> If you don’t want to spend time on this, I’d be happy to create a
>>>>>>> candidate patch for review? I’ve been unclear if you were taking
>>>>>>> patches for
>>>>>>> your shuffle work prior to it becoming the default.
>>>>>> 
>>>>>> 
>>>>>> While I'm happy to work on it, I'm even more happy to have patches. =D
>>>>>> 
>>>>>> _______________________________________________
>>>>>> LLVM Developers mailing list
>>>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>>> 
>>>> 
>>>> 
>>> 
> <test.ll>_______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>         http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/>
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141002/c9dd2cab/attachment.html>