[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
Robert Lougher
rob.lougher at gmail.com
Fri Sep 5 11:09:12 PDT 2014
Hi Chandler,
On 5 September 2014 17:38, Chandler Carruth <chandlerc at gmail.com> wrote:
>
> On Fri, Sep 5, 2014 at 9:32 AM, Robert Lougher <rob.lougher at gmail.com>
> wrote:
>>
>> Unfortunately, another team, while doing internal testing has seen the
>> new path generating illegal insertps masks. A sample here:
>>
>> vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3]
>> vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3]
>> vinsertps $256, %xmm13, %xmm1, %xmm7 # xmm7 = xmm13[0],xmm1[1,2,3]
>> vinsertps $416, %xmm1, %xmm4, %xmm14 # xmm14 =
>> xmm4[0,1],xmm1[2],xmm4[3]
>> vinsertps $416, %xmm13, %xmm6, %xmm13 # xmm13 =
>> xmm6[0,1],xmm13[2],xmm6[3]
>> vinsertps $416, %xmm0, %xmm7, %xmm0 # xmm0 =
>> xmm7[0,1],xmm0[2],xmm7[3]
>>
>> We'll continue to look into this and do additional testing.
>
>
> Interesting. Let me know if you get a test case. The insertps code path was
> added recently though and has been much less well tested. I'll start fuzz
> testing it and should hopefully uncover the bug.
Here's two small test cases. Hope they are of use.
Thanks,
Rob.
------
define <4 x float> @test(<4 x float> %xyzw, <4 x float> %abcd) {
%1 = extractelement <4 x float> %xyzw, i32 0
%2 = insertelement <4 x float> undef, float %1, i32 0
%3 = insertelement <4 x float> %2, float 0.000000e+00, i32 1
%4 = shufflevector <4 x float> %3, <4 x float> %xyzw, <4 x i32> <i32
0, i32 1, i32 6, i32 undef>
%5 = shufflevector <4 x float> %4, <4 x float> %abcd, <4 x i32> <i32
0, i32 1, i32 2, i32 4>
ret <4 x float> %5
}
define <4 x float> @test2(<4 x float> %xyzw, <4 x float> %abcd) {
%1 = shufflevector <4 x float> %xyzw, <4 x float> %abcd, <4 x i32>
<i32 0, i32 undef, i32 2, i32 4>
%2 = shufflevector <4 x float> <float undef, float 0.000000e+00,
float undef, float undef>, <4 x float> %1, <4 x i32> <i32 4, i32 1,
i32 6, i32 7>
ret <4 x float> %2
}
llc -march=x86-64 -mattr=+avx test.ll -o -
test: # @test
vxorps %xmm2, %xmm2, %xmm2
vmovss %xmm0, %xmm2, %xmm2
vblendps $4, %xmm0, %xmm2, %xmm0 # xmm0 = xmm2[0,1],xmm0[2],xmm2[3]
vinsertps $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]
retl
test2: # @test2
vinsertps $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]
vxorps %xmm1, %xmm1, %xmm1
vblendps $13, %xmm0, %xmm1, %xmm0 # xmm0 = xmm0[0],xmm1[1],xmm0[2,3]
retl
llc -march=x86-64 -mattr=+avx
-x86-experimental-vector-shuffle-lowering test.ll -o -
test: # @test
vinsertps $270, %xmm0, %xmm0, %xmm2 # xmm2 = xmm0[0],zero,zero,zero
vinsertps $416, %xmm0, %xmm2, %xmm0 # xmm0 = xmm2[0,1],xmm0[2],xmm2[3]
vinsertps $304, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]
retl
test2: # @test2
vinsertps $304, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]
vxorps %xmm1, %xmm1, %xmm1
vinsertps $336, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0],xmm1[1],xmm0[2,3]
retl
More information about the llvm-dev
mailing list