[PATCH] D101555: [SLP]Improve handling of compensate external uses cost.
Alexey Bataev via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed May 26 06:22:12 PDT 2021
ABataev added a comment.
In D101555#2781034 <https://reviews.llvm.org/D101555#2781034>, @rupprecht wrote:
> In D101555#2780152 <https://reviews.llvm.org/D101555#2780152>, @ABataev wrote:
>
>> In D101555#2780145 <https://reviews.llvm.org/D101555#2780145>, @rupprecht wrote:
>>
>>> We're seeing some test failures that bisected to this patch, possibly a miscompile. The test failure is in the unit test for this file: https://github.com/google/tink/blob/master/cc/subtle/aes_eax_aesni.cc. Are there already any known issues with this patch?
>>
>> No, there are not. It would help if you could provide the reproducer and exact compile command to check if the problem exists.
>
> I was unsuccessful in getting it to repro directly from the open source repo. However I reduced this which shows the issue:
>
> $ cat repro.cc
> #include <xmmintrin.h>
>
> #include <cstdint>
> #include <cstdio>
> #include <cstring>
>
> // https://github.com/google/tink/blob/a72c9d542cd1dd8b58b2620ab52585cf5544f212/cc/subtle/aes_eax_aesni.cc#L79
> inline __m128i Add(__m128i x, uint64_t y) {
> // Convert to a vector of two uint64_t.
> uint64_t vec[2];
> _mm_storeu_si128(reinterpret_cast<__m128i *>(vec), x);
> // Perform the addition on the vector.
> vec[0] += y;
> if (y > vec[0]) {
> vec[1]++;
> }
> // Convert back to xmm.
> return _mm_loadu_si128(reinterpret_cast<__m128i *>(vec));
> }
>
> void print128(__m128i var) {
> uint64_t parts[2];
> memcpy(parts, &var, sizeof(parts));
> printf("%lu %lu\n", parts[0], parts[1]);
> }
>
> template <class T>
> void DoNotOptimize(const T &var) {
> asm volatile("" : "+m"(const_cast<T &>(var)));
> }
>
> int main() {
> __m128i x = _mm_setzero_si128();
> DoNotOptimize(x);
> __m128i y = Add(x, 1);
> print128(x);
> print128(y);
> }
> $ clang++ repro.cc -o /tmp/miscompile -O2 -fno-slp-vectorize && /tmp/miscompile
> 0 0
> 1 0
> $ clang++ repro.cc -o /tmp/miscompile -O2 && /tmp/miscompile
> 0 0
> 1 1
>
> Prior to this patch, there is no difference when enabling or disabling `-fslp-vectorize`. The issue seems to be how this optimizes `Add`:
>
> vec[0] += y;
> if (y > vec[0]) { // This effectively evaluates to true
> vec[1]++;
> }
Here is a fix D103164 <https://reviews.llvm.org/D103164>
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D101555/new/
https://reviews.llvm.org/D101555
More information about the llvm-commits
mailing list