[cfe-dev] Clang 4.0.1 C++ code generation issue (bug?)

Serge Preis via cfe-dev cfe-dev at lists.llvm.org
Tue Jul 11 22:26:50 PDT 2017


Hello,

my research shows that issue seems to be fixed by this commit: https://reviews.llvm.org/rL303851
which is not part of 4.0.1 unfortunately.

Reagrds,
Serge.

11.07.2017, 15:40, "Dimitry Andric via cfe-dev" <cfe-dev at lists.llvm.org>:
> Hi Chris,
>
> See https://bugs.llvm.org/show_bug.cgi?id=31928 for a similar story.
>
> Short answer: do not write to one member of a union, then read from another, since that is "implementation defined" behavior, and this was broken going from clang 3.9.0 to 4.0.0 (or maybe it was always broken, and it did not optimize "enough" to be of influence).
>
> If your code really must write one union member and read another, and you do not want to refactor these accesses using memcpy, you will have to use -fno-strict-aliasing.
>
> -Dimitry
>
>>  On 11 Jul 2017, at 10:10, via cfe-dev <cfe-dev at lists.llvm.org> wrote:
>>
>>  Greetings Clangers,
>>
>>  SUMMARY:
>>  I’m experiencing a C++ code generation difference between clang 3.9.0 & 4.0.1 that is resulting in unexpected behaviour in my code. The difference also exhibits between -O1 & -O2/3 with clang 4.0.1.
>>  (I’ve seen this issue for i386 & x86_64 on OSX, but it probably affects all OSes.)
>>
>>  DISCLAIMER:
>>  I’m not entirely sure if this is a bug in clang or some kind of undefined behaviour.
>>  (I can’t fully get my head around the class.union section of the C++ standard to determine its secret meaning.)
>>  I don’t know if the problem is located in clang’s code or LLVM’s code. (But I’m reporting it here.)
>>  Using -std=c++14 or -std=c++1z makes no difference.
>>
>>  HISTORY:
>>  I recently upgraded to clang 4.0.1 (using http://releases.llvm.org/4.0.1/clang+llvm-4.0.1-x86_64-apple-darwin.tar.xz). Most things seem to be fine (~2MLOC & ~150 binaries) apart from one problem that only exhibits itself in a release build. (The problem exhibited as corrupted documents & crashes … during testing.) The original code has been in use for > 15 years & was previously compiled, successfully, using clang 3.9.0 & 3.6.1, gcc 4.2.1 & CodeWarrior for PPC, PPC-64, i386 & x86-64 (in various combinations).
>>
>>  CODE:
>>  I have spent several days distilling, reducing & refining the code that exhibits the problem to the following:
>>
>>  // TestClang.cpp - Test clang 4.0.1 code generation issue
>>
>>  #include <cstdio>
>>  #include <cstring>
>>
>>  struct B {
>>     char t;
>>     union { char c; int x; void* p; };
>>  #ifndef FIX
>>     B& operator= (const B& rhs) { t = rhs.t; p = rhs.p; return *this; }
>>  #endif
>>  };
>>
>>  union U { union { int f; char s[8]; } n; B b; };
>>
>>  struct D {
>>     const int size;
>>     B e[8];
>>
>>     __attribute__((noinline)) D (int count, const U objs[]) : size(count)
>>     {
>>         U tmp{ .b.t = 1, .b.x = 0x123400 }; // b.x set to help see problem
>>         #pragma clang loop unroll(disable) // Shortens generated code
>>         for (auto* it = e; count--; ++it)
>>         {
>>             const U* val = objs++;
>>             if (val->n.s[0] > 32)
>>             {
>>                 tmp.b.x = val->n.f; // <<<<< PROBLEM IS AROUND HERE <<<<<
>>                 val = &tmp;
>>     // if (!size) std::puts(tmp.n.s); // Also fixes code
>>             }
>>             *it = val->b;
>>         }
>>     }
>>  };
>>
>>  int main (int argc, const char* argv[])
>>  {
>>     const char* args[] = { "one!", "two!" };
>>     int count = argc ? 2 : argc - 1; // Prevent over optimisation
>>     auto s = argc ? args : argv + 1;
>>
>>     U us[8];
>>     for (int i = 0; i < count; ++i)
>>         std::strncpy(us[i].n.s, *s++, 8);
>>
>>     const D dict(count, us);
>>     for (int i = 0; i < dict.size; ++i)
>>     {
>>         auto& n = dict.e[i];
>>         std::printf(" %u. '%.4s' [%u:$%08X]\n", i, &n.c, n.t, n.x);
>>     }
>>  }
>>
>>  // end
>>
>>  TESTS:
>>  When the following line is executed:
>>     clang -arch i386 -O2 -Wall -std=c++14 -stdlib=libc++ TestClang.cpp && ./a.out
>>  the output is:
>>   0. '' [1:$00123400]
>>   1. '' [1:$00123400]
>>  The output should be:
>>   0. 'one!' [1:$21656E6F]
>>   1. 'two!' [1:$216F7774]
>>
>>  Adding the option `-DFIX` generates the expected output.
>>  Removing the `-O2` option with clang 4.0.1 generates the expected output without `-DFIX`.
>>  Uncommenting line 31 also generates the expected output with clang 4.0.1.
>>  (Line 31 does nothing - other than tricking the optimiser.)
>>
>>  Using `-arch x86_64` and/or `-std=c++1z` with clang 4.0.1 makes no difference.
>>  Using `-Weverything` provides no useful output! (I know C++14 != C++98.)
>>  Using clang 3.9.0 instead of 4.0.1 generates the expected output with all the above variations.
>>
>>  The problem appears to be related to the B::operator= code and code (possibly loop) optimisation.
>>
>>  ASSEMBLER:
>>  The relevant parts of the code (the loop from lines 24-34) generate the following with clang 4.0.1.
>>
>>  The GOOD code (-DFIX):
>>  LBB2_2: ## =>This Inner Loop Header: Depth=1
>>          decl %eax
>>          cmpb $33, (%edx)
>>          movl %edx, %edi
>>          jl LBB2_4
>>  ## BB#3: ## in Loop: Header=BB2_2 Depth=1
>>          movl (%edx), %edi
>>          movl %edi, -12(%ebp)
>>          movl %esi, %edi
>>  LBB2_4: ## in Loop: Header=BB2_2 Depth=1
>>          addl $8, %edx
>>          movsd (%edi), %xmm0 ## xmm0 = mem[0],zero
>>          movsd %xmm0, (%ecx)
>>          addl $8, %ecx
>>          testl %eax, %eax
>>          jne LBB2_2
>>
>>  The BAD code (no -DFIX):
>>  LBB2_2: ## =>This Inner Loop Header: Depth=1
>>          decl %eax
>>          movzbl (%edx), %ebx
>>          cmpb $33, %bl
>>          movl %edx, %edi
>>          jl LBB2_4
>>  ## BB#3: ## in Loop: Header=BB2_2 Depth=1
>>          movl (%edx), %esi
>>          movb $1, %bl
>>          leal -24(%ebp), %edi
>>  LBB2_4: ## in Loop: Header=BB2_2 Depth=1
>>          addl $8, %edx
>>          movb %bl, (%ecx)
>>          movl 4(%edi), %edi
>>          movl %edi, 4(%ecx)
>>          addl $8, %ecx
>>          testl %eax, %eax
>>          jne LBB2_2
>>  ## BB#5:
>>          movl %esi, -20(%ebp)
>>
>>  This last line looks problematic (if I’m reading it correctly).
>>  It’s storing %esi only once although it is loaded each time round the loop at BB#3.
>>  It appears to be generated by `tmp.b.x = val->n.f;` which has migrated outside the loop & after tmp is read.
>>
>>  QUESTIONS:
>>  So my questions are:
>>   Is this a bug in clang/LLVM?
>>   Does anyone else see this?
>>   Do clang 4.0.0 and/or clang 3.9.1 exhibit this problem?
>>   Is this caused by a rare code combination or is it going to silently break lots of code?
>>   Is this serious enough to warrant/require a clang 4.0.2?
>>   Do you need any further info to help fix this?
>>   Is someone willing to, please, fix it?
>>
>>  Thanks,
>>
>>  CHRIS
>>
>>  _______________________________________________
>>  cfe-dev mailing list
>>  cfe-dev at lists.llvm.org
>>  http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
> ,
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev



More information about the cfe-dev mailing list