[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

Fri Sep 5 16:36:26 PDT 2014

Hi Chandler,

While doing the performance measurement on a Ivy Bridge, I ran into compile time errors.

I saw a bunch of “cannot select" in the LLVM test suite with -march=core-avx-i.
E.g., SingleSource/UnitTests/Vector/SSE/sse.isamax.c is failing at O3 -march=core-avx-i with:
fatal error: error in backend: Cannot select: 0x7f91b99a6420: v4i32 = bitcast 0x7f91b99b0e10 [ORD=3] [ID=27]
  0x7f91b99b0e10: v4i64 = insert_subvector 0x7f91b99a7210, 0x7f91b99a6d68, 0x7f91b99ace70 [ORD=2] [ID=25]
    0x7f91b99a7210: v4i64 = undef [ID=15]
    0x7f91b99a6d68: v2i64 = scalar_to_vector 0x7f91b99ab840 [ORD=2] [ID=23]
      0x7f91b99ab840: i64 = AssertZext 0x7f91b99acc60, 0x7f91b99ac738 [ORD=2] [ID=20]
        0x7f91b99acc60: i64,ch = CopyFromReg 0x7f91b8d52820, 0x7f91b99a3a10 [ORD=2] [ID=16]
          0x7f91b99a3a10: i64 = Register %vreg68 [ID=1]
    0x7f91b99ace70: i64 = Constant<0> [ID=3]
In function: isamax0
clang: error: clang frontend command failed with exit code 70 (use -v to see invocation)
clang version 3.6.0 (215249)
Target: x86_64-apple-darwin14.0.0

For some reason, I cannot reproduce the problem with the test case that clang gives me using -emit-llvm. Since the source is public, I guess you can try to reproduce on your side.
Indeed, if you run the test-suite with -march=core-avx-i you’ll likely see all those failures.

Let me know if you cannot and I’ll try harder to produce a test case.

Note: This is the same failure all over the place, i.e., cannot select a bit cast from various types to v4i32 or v4i64.

Thanks,
-Quentin

> On Sep 5, 2014, at 11:09 AM, Robert Lougher <rob.lougher@
> gmail.com> wrote:
> 
> Hi Chandler,
> 
> On 5 September 2014 17:38, Chandler Carruth <chandlerc at gmail.com <mailto:chandlerc at gmail.com>> wrote:
>> 
>> On Fri, Sep 5, 2014 at 9:32 AM, Robert Lougher <rob.lougher at gmail.com>
>> wrote:
>>> 
>>> Unfortunately, another team, while doing internal testing has seen the
>>> new path generating illegal insertps masks.  A sample here:
>>> 
>>>    vinsertps    $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3]
>>>    vinsertps    $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3]
>>>    vinsertps    $256, %xmm13, %xmm1, %xmm7 # xmm7 = xmm13[0],xmm1[1,2,3]
>>>    vinsertps    $416, %xmm1, %xmm4, %xmm14 # xmm14 =
>>> xmm4[0,1],xmm1[2],xmm4[3]
>>>    vinsertps    $416, %xmm13, %xmm6, %xmm13 # xmm13 =
>>> xmm6[0,1],xmm13[2],xmm6[3]
>>>    vinsertps    $416, %xmm0, %xmm7, %xmm0 # xmm0 =
>>> xmm7[0,1],xmm0[2],xmm7[3]
>>> 
>>> We'll continue to look into this and do additional testing.
>> 
>> 
>> Interesting. Let me know if you get a test case. The insertps code path was
>> added recently though and has been much less well tested. I'll start fuzz
>> testing it and should hopefully uncover the bug.
> 
> Here's two small test cases.  Hope they are of use.
> 
> Thanks,
> Rob.
> 
> ------
> define <4 x float> @test(<4 x float> %xyzw, <4 x float> %abcd) {
>  %1 = extractelement <4 x float> %xyzw, i32 0
>  %2 = insertelement <4 x float> undef, float %1, i32 0
>  %3 = insertelement <4 x float> %2, float 0.000000e+00, i32 1
>  %4 = shufflevector <4 x float> %3, <4 x float> %xyzw, <4 x i32> <i32
> 0, i32 1, i32 6, i32 undef>
>  %5 = shufflevector <4 x float> %4, <4 x float> %abcd, <4 x i32> <i32
> 0, i32 1, i32 2, i32 4>
>  ret <4 x float> %5
> }
> 
> define <4 x float> @test2(<4 x float> %xyzw, <4 x float> %abcd) {
>  %1 = shufflevector <4 x float> %xyzw, <4 x float> %abcd, <4 x i32>
> <i32 0, i32 undef, i32 2, i32 4>
>  %2 = shufflevector <4 x float> <float undef, float 0.000000e+00,
> float undef, float undef>, <4 x float> %1, <4 x i32> <i32 4, i32 1,
> i32 6, i32 7>
>  ret <4 x float> %2
> }
> 
> 
> llc -march=x86-64 -mattr=+avx test.ll -o -
> 
> test:                                   # @test
>    vxorps    %xmm2, %xmm2, %xmm2
>    vmovss    %xmm0, %xmm2, %xmm2
>    vblendps    $4, %xmm0, %xmm2, %xmm0 # xmm0 = xmm2[0,1],xmm0[2],xmm2[3]
>    vinsertps    $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]
>    retl
> 
> test2:                                  # @test2
>    vinsertps    $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]
>    vxorps    %xmm1, %xmm1, %xmm1
>    vblendps    $13, %xmm0, %xmm1, %xmm0 # xmm0 = xmm0[0],xmm1[1],xmm0[2,3]
>    retl
> 
> llc -march=x86-64 -mattr=+avx
> -x86-experimental-vector-shuffle-lowering test.ll -o -
> 
> test:                                   # @test
>    vinsertps    $270, %xmm0, %xmm0, %xmm2 # xmm2 = xmm0[0],zero,zero,zero
>    vinsertps    $416, %xmm0, %xmm2, %xmm0 # xmm0 = xmm2[0,1],xmm0[2],xmm2[3]
>    vinsertps    $304, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]
>    retl
> 
> test2:                                  # @test2
>    vinsertps    $304, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]
>    vxorps    %xmm1, %xmm1, %xmm1
>    vinsertps    $336, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0],xmm1[1],xmm0[2,3]
>    retl
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>         http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/>
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140905/626d7652/attachment.html>