[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

Ahmed Bougacha ahmed.bougacha at gmail.com
Fri Jan 30 11:15:32 PST 2015


I filed a couple more, in case they're actually different issues:
- http://llvm.org/bugs/show_bug.cgi?id=22412
- http://llvm.org/bugs/show_bug.cgi?id=22413

And that's pretty much it for internal changes.  I'm fine with flipping the
switch; Quentin, are you?
Also, just to have an idea, do you (or someone else!) plan to tackle these
in the near future?

-Ahmed

On Thu, Jan 29, 2015 at 11:50 AM, Ahmed Bougacha <ahmed.bougacha at gmail.com>
wrote:

>
> On Wed, Jan 28, 2015 at 4:47 PM, Chandler Carruth <chandlerc at gmail.com>
> wrote:
>
>>
>> On Wed, Jan 28, 2015 at 4:05 PM, Ahmed Bougacha <ahmed.bougacha at gmail.com
>> > wrote:
>>
>>> Hi Chandler,
>>>
>>> I've been looking at the regressions Quentin mentioned, and filed a PR
>>> for the most egregious one: http://llvm.org/bugs/show_bug.cgi?id=22377
>>>
>>> As for the others, I'm working on reducing them, but for now, here are
>>> some raw observations, in case any of it rings a bell:
>>>
>>
>> Very cool, and thanks for the analysis!
>>
>>
>>>
>>>
>>> Another problem I'm seeing is that in some cases we can't fold memory
>>> anymore:
>>>     vpermilps     $-0x6d, -0xXX(%rdx), %xmm2 ## xmm2 = mem[3,0,1,2]
>>>     vblendps      $0x1, %xmm2, %xmm0, %xmm0
>>> becomes:
>>>     vmovaps       -0xXX(%rdx), %xmm2
>>>     vshufps       $0x3, %xmm0, %xmm2, %xmm3 ## xmm3 = xmm2[3,0],xmm0[0,0]
>>>     vshufps       $-0x68, %xmm0, %xmm3, %xmm0 ## xmm0 =
>>> xmm3[0,2],xmm0[1,2]
>>>
>>>
>>> Also, I see differences when some loads are shuffled, that I'm a bit
>>> conflicted about:
>>>     vmovaps       -0xXX(%rbp), %xmm3
>>>     ...
>>>     vinsertps     $0xc0, %xmm4, %xmm3, %xmm5 ## xmm5 =
>>> xmm4[3],xmm3[1,2,3]
>>> becomes:
>>>     vpermilps     $-0x6d, -0xXX(%rbp), %xmm2 ## xmm2 = mem[3,0,1,2]
>>>     ...
>>>     vinsertps     $0xc0, %xmm4, %xmm2, %xmm2 ## xmm2 =
>>> xmm4[3],xmm2[1,2,3]
>>>
>>> Note that the second version does the shuffle in-place, in xmm2.
>>>
>>>
>>> Some are blends (har har) of those two:
>>>     vpermilps     $-0x6d, %xmm_mem_1, %xmm6 ## xmm6 = xmm_mem_1[3,0,1,2]
>>>     vpermilps     $-0x6d, -0xXX(%rax), %xmm1 ## xmm1 = mem_2[3,0,1,2]
>>>     vblendps      $0x1, %xmm1, %xmm6, %xmm0 ## xmm0 = xmm1[0],xmm6[1,2,3]
>>> becomes:
>>>     vmovaps       -0xXX(%rax), %xmm0 ## %xmm0 = mem_2[0,1,2,3]
>>>     vpermilps     $-0x6d, %xmm0, %xmm1 ## xmm1 = xmm0[3,0,1,2]
>>>     vshufps       $0x3, %xmm_mem_1, %xmm0, %xmm0 ## xmm0
>>> = xmm0[3,0],xmm_mem_1[0,0]
>>>     vshufps       $-0x68, %xmm_mem_1, %xmm0, %xmm0 ## xmm0
>>> = xmm0[0,2],xmm_mem_1[1,2]
>>>
>>>
>>> I also see a lot of somewhat neutral (focusing on Haswell for now)
>>> domain changes such as (xmm5 and 0 are initially integers, and are
>>> dead after the store):
>>>     vpshufd       $-0x5c, %xmm0, %xmm0    ## xmm0 = xmm0[0,1,2,2]
>>>     vpalignr      $0xc, %xmm0, %xmm5, %xmm0 ## xmm0
>>> = xmm0[12,13,14,15],xmm5[0,1,2,3,4,5,6,7,8,9,10,11]
>>>     vmovdqu       %xmm0, 0x20(%rax)
>>> turning into:
>>>     vshufps       $0x2, %xmm5, %xmm0, %xmm0 ## xmm0 = xmm0[2,0],xmm5[0,0]
>>>     vshufps       $-0x68, %xmm5, %xmm0, %xmm0 ## xmm0 =
>>> xmm0[0,2],xmm5[1,2]
>>>     vmovups       %xmm0, 0x20(%rax)
>>>
>>
>> All of these stem from what I think is the same core weakness of the
>> current algorithm: we prefer the fully general shufps+shufps 4-way
>> shuffle/blend far too often. Here is how I would more precisely classify
>> the two things missing here:
>>
>> - Check if either inputs are "in place" and we can do a fast single-input
>> shuffle with a fixed blend.
>>
>
> I believe this would be http://llvm.org/bugs/show_bug.cgi?id=22390
>
>
>> - Check if we can form a rotation and use palignr to finish a
>> shuffle/blend
>>
>
> .. and this would be  http://llvm.org/bugs/show_bug.cgi?id=22391
>
> I think this about covers the Haswell regressions I'm seeing.  Now for
> some pre-AVX fun!
>
> -Ahmed
>
>
>> There may be other patterns we're missing, but these two seem to jump out
>> based on your analysis, and may be fairly easy to tackle.
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150130/a4e875a4/attachment.html>


More information about the llvm-dev mailing list