[cfe-dev] Clang 4.0.1 C++ code generation issue (bug?)

Tue Jul 11 01:10:18 PDT 2017

Greetings Clangers,

SUMMARY:
I’m experiencing a C++ code generation difference between clang 3.9.0 & 4.0.1 that is resulting in unexpected behaviour in my code.  The difference also exhibits between -O1 & -O2/3 with clang 4.0.1.
(I’ve seen this issue for i386 & x86_64 on OSX, but it probably affects all OSes.)

DISCLAIMER:
I’m not entirely sure if this is a bug in clang or some kind of undefined behaviour.
(I can’t fully get my head around the class.union section of the C++ standard to determine its secret meaning.)
I don’t know if the problem is located in clang’s code or LLVM’s code.  (But I’m reporting it here.)
Using -std=c++14 or -std=c++1z makes no difference.

HISTORY:
I recently upgraded to clang 4.0.1 (using http://releases.llvm.org/4.0.1/clang+llvm-4.0.1-x86_64-apple-darwin.tar.xz).  Most things seem to be fine (~2MLOC & ~150 binaries) apart from one problem that only exhibits itself in a release build.  (The problem exhibited as corrupted documents & crashes … during testing.)  The original code has been in use for > 15 years & was previously compiled, successfully, using clang 3.9.0 & 3.6.1, gcc 4.2.1 & CodeWarrior for PPC, PPC-64, i386 & x86-64 (in various combinations).

CODE:
I have spent several days distilling, reducing & refining the code that exhibits the problem to the following:

// TestClang.cpp - Test clang 4.0.1 code generation issue

#include <cstdio>
#include <cstring>

struct B {
    char t;
    union { char c; int x; void* p; };
#ifndef FIX
    B& operator= (const B& rhs) { t = rhs.t; p = rhs.p; return *this; }
#endif
};

union U { union { int f; char s[8]; } n; B b; };

struct D {
    const int size;
    B e[8];

    __attribute__((noinline)) D (int count, const U objs[]) : size(count)
    {
        U tmp{ .b.t = 1, .b.x = 0x123400 }; // b.x set to help see problem
        #pragma clang loop unroll(disable)  // Shortens generated code
        for (auto* it = e; count--; ++it)
        {
            const U* val = objs++;
            if (val->n.s[0] > 32)
            {
                tmp.b.x = val->n.f;         // <<<<< PROBLEM IS AROUND HERE <<<<<
                val = &tmp;
    //          if (!size) std::puts(tmp.n.s); // Also fixes code
            }
            *it = val->b;
        }
    }
};

int main (int argc, const char* argv[])
{
    const char* args[] = { "one!", "two!" };
    int count = argc ? 2    : argc - 1;     // Prevent over optimisation
    auto    s = argc ? args : argv + 1;

    U us[8];
    for (int i = 0; i < count; ++i)
        std::strncpy(us[i].n.s, *s++, 8);

    const D dict(count, us);
    for (int i = 0; i < dict.size; ++i)
    {
        auto& n = dict.e[i];
        std::printf("  %u. '%.4s' [%u:$%08X]\n", i, &n.c, n.t, n.x);
    }
}

// end

TESTS:
When the following line is executed:
    clang -arch i386 -O2 -Wall -std=c++14 -stdlib=libc++ TestClang.cpp && ./a.out
the output is:
  0. '' [1:$00123400]
  1. '' [1:$00123400]
The output should be:
  0. 'one!' [1:$21656E6F]
  1. 'two!' [1:$216F7774]

Adding the option `-DFIX` generates the expected output.
Removing the `-O2` option with clang 4.0.1 generates the expected output without `-DFIX`.
Uncommenting line 31 also generates the expected output with clang 4.0.1.
(Line 31 does nothing - other than tricking the optimiser.)

Using `-arch x86_64` and/or `-std=c++1z` with clang 4.0.1 makes no difference.
Using `-Weverything` provides no useful output!  (I know C++14 != C++98.)
Using clang 3.9.0 instead of 4.0.1 generates the expected output with all the above variations.

The problem appears to be related to the B::operator= code and code (possibly loop) optimisation.

ASSEMBLER:
The relevant parts of the code (the loop from lines 24-34) generate the following with clang 4.0.1.

The GOOD code (-DFIX):
LBB2_2:                                 ## =>This Inner Loop Header: Depth=1
	decl	%eax
	cmpb	$33, (%edx)
	movl	%edx, %edi
	jl	LBB2_4
## BB#3:                                ##   in Loop: Header=BB2_2 Depth=1
	movl	(%edx), %edi
	movl	%edi, -12(%ebp)
	movl	%esi, %edi
LBB2_4:                                 ##   in Loop: Header=BB2_2 Depth=1
	addl	$8, %edx
	movsd	(%edi), %xmm0           ## xmm0 = mem[0],zero
	movsd	%xmm0, (%ecx)
	addl	$8, %ecx
	testl	%eax, %eax
	jne	LBB2_2

The BAD code (no -DFIX):
LBB2_2:                                 ## =>This Inner Loop Header: Depth=1
	decl	%eax
	movzbl	(%edx), %ebx
	cmpb	$33, %bl
	movl	%edx, %edi
	jl	LBB2_4
## BB#3:                                ##   in Loop: Header=BB2_2 Depth=1
	movl	(%edx), %esi
	movb	$1, %bl
	leal	-24(%ebp), %edi
LBB2_4:                                 ##   in Loop: Header=BB2_2 Depth=1
	addl	$8, %edx
	movb	%bl, (%ecx)
	movl	4(%edi), %edi
	movl	%edi, 4(%ecx)
	addl	$8, %ecx
	testl	%eax, %eax
	jne	LBB2_2
## BB#5:
	movl	%esi, -20(%ebp)

This last line looks problematic (if I’m reading it correctly).
It’s storing %esi only once although it is loaded each time round the loop at BB#3.
It appears to be generated by `tmp.b.x = val->n.f;` which has migrated outside the loop & after tmp is read.

QUESTIONS:
So my questions are:
  Is this a bug in clang/LLVM?
  Does anyone else see this?
  Do clang 4.0.0 and/or clang 3.9.1 exhibit this problem?
  Is this caused by a rare code combination or is it going to silently break lots of code?
  Is this serious enough to warrant/require a clang 4.0.2?
  Do you need any further info to help fix this?
  Is someone willing to, please, fix it?

Thanks,

CHRIS