[llvm] r208342 - [X86] Add target specific combine rules to fold SSE2/AVX2 packed arithmetic shift intrinsics.

Sat May 10 09:49:08 PDT 2014

No problem.
I am trying to see if I can reproduce the slowdown on my machine.
In case, I'll update this thread with all my findings.

Andrea

On Sat, May 10, 2014 at 5:04 PM, Tobias Grosser <tobias at grosser.es> wrote:
> On 10/05/2014 17:57, Andrea Di Biagio wrote:
>>
>> mm.. That's really odd. Revision 208342 only affects how packed
>> SSE2/AVX2 arithmetic intrinsics are combined in the x86 backend.
>>
>> +  case Intrinsic::x86_sse2_psrai_w:
>> +  case Intrinsic::x86_sse2_psrai_d:
>> +  case Intrinsic::x86_avx2_psrai_w:
>> +  case Intrinsic::x86_avx2_psrai_d:
>> +  case Intrinsic::x86_sse2_psra_w:
>> +  case Intrinsic::x86_sse2_psra_d:
>> +  case Intrinsic::x86_avx2_psra_w:
>> +  case Intrinsic::x86_avx2_psra_d:
>>
>> I have done now a fresh checkout of the llvm test-sute
>> % svn co http://llvm.org/svn/llvm-project/test-suite/trunk test-suite
>>
>> I did a recursive search of '_mm_sr' (grep -r _mm_sr) from the
>> test-suite root directory and this was the only match found:
>> SingleSource/UnitTests/Vector/SSE/sse.shift.c:  zeroones =
>> _mm_srli_epi16(allones, 8);
>>
>> That intrinsic is for a logical packed shift (definitely not one of
>> the intrinsics optimized by my patch)..
>> I couldn't find any occurrence of avx2 intrinsics in the entire test
>> suite.
>>
>> No idea honestly why this change could have caused any regressions in
>> sphereflake..
>
>
> OK. Thanks for checking. The other changes look even more unrelated,
> so I will just bisect the regression.
>
> Cheers,
> Tobias