[PATCH] D101555: [SLP]Improve handling of compensate external uses cost.

Tue May 25 19:16:13 PDT 2021

rupprecht added a comment.

In D101555#2780152 <https://reviews.llvm.org/D101555#2780152>, @ABataev wrote:

> In D101555#2780145 <https://reviews.llvm.org/D101555#2780145>, @rupprecht wrote:
>
>> We're seeing some test failures that bisected to this patch, possibly a miscompile. The test failure is in the unit test for this file: https://github.com/google/tink/blob/master/cc/subtle/aes_eax_aesni.cc. Are there already any known issues with this patch?
>
> No, there are not. It would help if you could provide the reproducer and exact compile command to check if the problem exists.

I was unsuccessful in getting it to repro directly from the open source repo. However I reduced this which shows the issue:

  $ cat repro.cc
  #include <xmmintrin.h>

  #include <cstdint>
  #include <cstdio>
  #include <cstring>

  // https://github.com/google/tink/blob/a72c9d542cd1dd8b58b2620ab52585cf5544f212/cc/subtle/aes_eax_aesni.cc#L79
  inline __m128i Add(__m128i x, uint64_t y) {
    // Convert to a vector of two uint64_t.
    uint64_t vec[2];
    _mm_storeu_si128(reinterpret_cast<__m128i *>(vec), x);
    // Perform the addition on the vector.
    vec[0] += y;
    if (y > vec[0]) {
      vec[1]++;
    }
    // Convert back to xmm.
    return _mm_loadu_si128(reinterpret_cast<__m128i *>(vec));
  }

  void print128(__m128i var) {
    uint64_t parts[2];
    memcpy(parts, &var, sizeof(parts));
    printf("%lu %lu\n", parts[0], parts[1]);
  }

  template <class T>
  void DoNotOptimize(const T &var) {
    asm volatile("" : "+m"(const_cast<T &>(var)));
  }

  int main() {
    __m128i x = _mm_setzero_si128();
    DoNotOptimize(x);
    __m128i y = Add(x, 1);
    print128(x);
    print128(y);
  }
  $ clang++ repro.cc -o /tmp/miscompile -O2 -fno-slp-vectorize && /tmp/miscompile
  0 0
  1 0
  $ clang++ repro.cc -o /tmp/miscompile -O2 && /tmp/miscompile
  0 0
  1 1

Prior to this patch, there is no difference when enabling or disabling `-fslp-vectorize`. The issue seems to be how this optimizes `Add`:

  vec[0] += y;
  if (y > vec[0]) {  // This effectively evaluates to true
    vec[1]++;
  }

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D101555/new/

https://reviews.llvm.org/D101555