[PATCH] D98714: [SLP] Add insertelement instructions to vectorizable tree

Tue May 18 10:28:10 PDT 2021

dyung added a comment.

In D98714#2766166 <https://reviews.llvm.org/D98714#2766166>, @ABataev wrote:

> In D98714#2764805 <https://reviews.llvm.org/D98714#2764805>, @dyung wrote:
>
>> Hi, this change has caused a regression in the codegen for one of our internal tests. Consider the following code:
>>
>>   __attribute__((noinline))
>>   __m256d add_pd_002(__m256d a, __m256d b) {
>>     __m256d r = (__m256d){ a[0] + a[1], a[2] + a[3], b[0] + b[1], b[2] + b[3] };
>>     return __builtin_shufflevector(r, a, 0, -1, 2, 3);
>>   }
>>
>> If you compile this with "-g0 -O3 -march=btver2", prior to your commit the compiler would generate the following code for the function:
>>
>>   # %bb.0:                                # %entry
>>           vinsertf128     $1, %xmm1, %ymm0, %ymm0
>>           vhaddpd %ymm1, %ymm0, %ymm0
>>
>> But after your change it is now generating the following code:
>>
>>   # %bb.0:                                # %entry
>>           vextractf128    $1, %ymm1, %xmm2
>>           vhaddpd %xmm0, %xmm0, %xmm0
>>           vhaddpd %xmm2, %xmm1, %xmm1
>>           vperm2f128      $2, %ymm0, %ymm1, %ymm0 # ymm0 = ymm0[0,1],ymm1[0,1]
>>
>> From your commit description, it sounds like this is expected and will be fixed in a follow-up commit. Is my understanding of this correct?
>
> I see a different result for btver2.
>
>   # %bb.0:
>           vextractf128    $1, %ymm1, %xmm2
>           vhaddpd %xmm1, %xmm0, %xmm0
>           vhaddpd %xmm2, %xmm1, %xmm1
>           vinsertf128     $1, %xmm1, %ymm0, %ymm0
>
> But I used llvm-11, most probably there is a difference with llvm-12.
>
> Currently, it is impossible to fix this issue. This problem will be fixed after non-power-2 vectorization support in SLP is landed since here we have a build vector of 3 elements (the second index in shuffle is -1 and thus the second sum is optimized out resulting in the build of <4 x double> vector using just 3 insertelement instructions). Looks like previously this patter was recognized by another transformation pass but currently SLP tries to vectorize it

The new result I posted in your quote was the result of a compiler built from this change. It is unfortunate to hear that we will have to take a regression for this, but I will update our internal test to expect it and file a bug so that it is not forgotten.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98714/new/

https://reviews.llvm.org/D98714