[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

Fri Sep 5 11:09:12 PDT 2014

Hi Chandler,

On 5 September 2014 17:38, Chandler Carruth <chandlerc at gmail.com> wrote:
>
> On Fri, Sep 5, 2014 at 9:32 AM, Robert Lougher <rob.lougher at gmail.com>
> wrote:
>>
>> Unfortunately, another team, while doing internal testing has seen the
>> new path generating illegal insertps masks.  A sample here:
>>
>>     vinsertps    $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3]
>>     vinsertps    $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3]
>>     vinsertps    $256, %xmm13, %xmm1, %xmm7 # xmm7 = xmm13[0],xmm1[1,2,3]
>>     vinsertps    $416, %xmm1, %xmm4, %xmm14 # xmm14 =
>> xmm4[0,1],xmm1[2],xmm4[3]
>>     vinsertps    $416, %xmm13, %xmm6, %xmm13 # xmm13 =
>> xmm6[0,1],xmm13[2],xmm6[3]
>>     vinsertps    $416, %xmm0, %xmm7, %xmm0 # xmm0 =
>> xmm7[0,1],xmm0[2],xmm7[3]
>>
>> We'll continue to look into this and do additional testing.
>
>
> Interesting. Let me know if you get a test case. The insertps code path was
> added recently though and has been much less well tested. I'll start fuzz
> testing it and should hopefully uncover the bug.

Here's two small test cases.  Hope they are of use.

Thanks,
Rob.

------
define <4 x float> @test(<4 x float> %xyzw, <4 x float> %abcd) {
  %1 = extractelement <4 x float> %xyzw, i32 0
  %2 = insertelement <4 x float> undef, float %1, i32 0
  %3 = insertelement <4 x float> %2, float 0.000000e+00, i32 1
  %4 = shufflevector <4 x float> %3, <4 x float> %xyzw, <4 x i32> <i32
0, i32 1, i32 6, i32 undef>
  %5 = shufflevector <4 x float> %4, <4 x float> %abcd, <4 x i32> <i32
0, i32 1, i32 2, i32 4>
  ret <4 x float> %5
}

define <4 x float> @test2(<4 x float> %xyzw, <4 x float> %abcd) {
  %1 = shufflevector <4 x float> %xyzw, <4 x float> %abcd, <4 x i32>
<i32 0, i32 undef, i32 2, i32 4>
  %2 = shufflevector <4 x float> <float undef, float 0.000000e+00,
float undef, float undef>, <4 x float> %1, <4 x i32> <i32 4, i32 1,
i32 6, i32 7>
  ret <4 x float> %2
}

llc -march=x86-64 -mattr=+avx test.ll -o -

test:                                   # @test
    vxorps    %xmm2, %xmm2, %xmm2
    vmovss    %xmm0, %xmm2, %xmm2
    vblendps    $4, %xmm0, %xmm2, %xmm0 # xmm0 = xmm2[0,1],xmm0[2],xmm2[3]
    vinsertps    $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]
    retl

test2:                                  # @test2
    vinsertps    $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]
    vxorps    %xmm1, %xmm1, %xmm1
    vblendps    $13, %xmm0, %xmm1, %xmm0 # xmm0 = xmm0[0],xmm1[1],xmm0[2,3]
    retl

llc -march=x86-64 -mattr=+avx
-x86-experimental-vector-shuffle-lowering test.ll -o -

test:                                   # @test
    vinsertps    $270, %xmm0, %xmm0, %xmm2 # xmm2 = xmm0[0],zero,zero,zero
    vinsertps    $416, %xmm0, %xmm2, %xmm0 # xmm0 = xmm2[0,1],xmm0[2],xmm2[3]
    vinsertps    $304, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]
    retl

test2:                                  # @test2
    vinsertps    $304, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]
    vxorps    %xmm1, %xmm1, %xmm1
    vinsertps    $336, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0],xmm1[1],xmm0[2,3]
    retl