[PATCH] D101555: [SLP]Improve handling of compensate external uses cost.

Wed May 26 06:22:12 PDT 2021

ABataev added a comment.

In D101555#2781034 <https://reviews.llvm.org/D101555#2781034>, @rupprecht wrote:

> In D101555#2780152 <https://reviews.llvm.org/D101555#2780152>, @ABataev wrote:
>
>> In D101555#2780145 <https://reviews.llvm.org/D101555#2780145>, @rupprecht wrote:
>>
>>> We're seeing some test failures that bisected to this patch, possibly a miscompile. The test failure is in the unit test for this file: https://github.com/google/tink/blob/master/cc/subtle/aes_eax_aesni.cc. Are there already any known issues with this patch?
>>
>> No, there are not. It would help if you could provide the reproducer and exact compile command to check if the problem exists.
>
> I was unsuccessful in getting it to repro directly from the open source repo. However I reduced this which shows the issue:
>
>   $ cat repro.cc
>   #include <xmmintrin.h>
>   
>   #include <cstdint>
>   #include <cstdio>
>   #include <cstring>
>   
>   // https://github.com/google/tink/blob/a72c9d542cd1dd8b58b2620ab52585cf5544f212/cc/subtle/aes_eax_aesni.cc#L79
>   inline __m128i Add(__m128i x, uint64_t y) {
>     // Convert to a vector of two uint64_t.
>     uint64_t vec[2];
>     _mm_storeu_si128(reinterpret_cast<__m128i *>(vec), x);
>     // Perform the addition on the vector.
>     vec[0] += y;
>     if (y > vec[0]) {
>       vec[1]++;
>     }
>     // Convert back to xmm.
>     return _mm_loadu_si128(reinterpret_cast<__m128i *>(vec));
>   }
>   
>   void print128(__m128i var) {
>     uint64_t parts[2];
>     memcpy(parts, &var, sizeof(parts));
>     printf("%lu %lu\n", parts[0], parts[1]);
>   }
>   
>   template <class T>
>   void DoNotOptimize(const T &var) {
>     asm volatile("" : "+m"(const_cast<T &>(var)));
>   }
>   
>   int main() {
>     __m128i x = _mm_setzero_si128();
>     DoNotOptimize(x);
>     __m128i y = Add(x, 1);
>     print128(x);
>     print128(y);
>   }
>   $ clang++ repro.cc -o /tmp/miscompile -O2 -fno-slp-vectorize && /tmp/miscompile
>   0 0
>   1 0
>   $ clang++ repro.cc -o /tmp/miscompile -O2 && /tmp/miscompile
>   0 0
>   1 1
>
> Prior to this patch, there is no difference when enabling or disabling `-fslp-vectorize`. The issue seems to be how this optimizes `Add`:
>
>   vec[0] += y;
>   if (y > vec[0]) {  // This effectively evaluates to true
>     vec[1]++;
>   }

Here is a fix D103164 <https://reviews.llvm.org/D103164>

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D101555/new/

https://reviews.llvm.org/D101555