[llvm-dev] struct bitfield regression between 3.6 and 3.9 (using -O0)

Phil Tomson via llvm-dev llvm-dev at lists.llvm.org
Wed Dec 21 16:45:13 PST 2016


Here's our testcase:

#include <stdio.h>

struct flags {
    unsigned frog: 1;
    unsigned foo : 1;
    unsigned bar : 1;
    unsigned bat : 1;
    unsigned baz : 1;
    unsigned bam : 1;
};

int main() {
    struct flags flags;
    flags.bar = 1;
    flags.foo = 1;
    if (flags.foo == 1) {
        printf("Pass\n");
        return 0;
    } else {
        printf("FAIL\n");
        return 1;
    }
}

when we compile this using LLVM 3.9 we get the "FAIL" message. However,
when we compile in LLVM 3.6 it passes. (this is only an issue with -O0,
higher levels of optimization work fine)

After some investigation we discovered the problem, here's the relevant
part of our assembly generated by LVM 3.9:

    load          r0, r510, 24, 8
    slr           r0, r0, 1, 8
    cmpimm        r0, r0, 1, 0, 8, SNE
    bitop1        r0, r0, 1<<0, AND, 64
    jct          .LBB0_2, r0, 0, N
    jrel         .LBB0_1

Notice the slr (shift logical right) instruction there is shifting to the
right 1 position in order to get  flags.foo into bit 0 of r0.  But the
problem is that the compare(cmpimm) is comparing not just the single bit
but the whole value in r0 (an 8-bit value) against 1. If we insert a
logical AND with '1' to mask r0 just prior to the compare it works fine.

And as it turns out, we see that *and* in the LLVM IR generated using -O0
and -emit-llvm has the AND included:
 ...
  %bf.lshr = lshr i8 %bf.load4, 1
*  %bf.clear5 = and i8 %bf.lshr, 1*
  %bf.cast = zext i8 %bf.clear5 to i32
  %cmp = icmp eq i32 %bf.cast, 1
  br i1 %cmp, label %if.then, label %if.else

(compiled with:  clang -O0 -emit-llvm -S failing.c -o failing.ll )

I reran passing -debug to llc to see what's happening at various stages of
DAG optimization:

clang -O0 -mllvm -debug -S failing.c -o failing.s

The initial selection DAG has the AND op node:

            t22: i8 = srl t19, Constant:i64<1>
*            t23: i8 = and t22, Constant:i8<1>*
          t24: i32 = zero_extend t23
        t27: i1 = setcc t24, Constant:i32<1>, seteq:ch
      t29: i1 = xor t27, Constant:i1<-1>
    t31: ch = brcond t18, t29, BasicBlock:ch<if.else 0xa5f8d48>
  t33: ch = br t31, BasicBlock:ch<if.then 0xa5f8c98>

The Optimized lowered selection DAG does not contain the* AND* node, but it
does have a truncate which would seem to stand in for it given the result
is only 1bit wide and the xor following it is operating on 1-bit wide
values:

         t22: i8 = srl t19, Constant:i64<1>
        t35: i1 = truncate t22
      t29: i1 = xor t35, Constant:i1<-1>
    t31: ch = brcond t18, t29, BasicBlock:ch<if.else 0xa5f8d48>
  t33: ch = br t31, BasicBlock:ch<if.then 0xa5f8c98>

Next we get to the Type-legalized selection DAG:

        t22: i8 = srl t19, Constant:i64<1>
      t40: i8 = xor t22, Constant:i8<1>
    t31: ch = brcond t18, t40, BasicBlock:ch<if.else 0xa5f8d48>
  t33: ch = br t31, BasicBlock:ch<if.then 0xa5f8c98>

 The truncate is now gone.

Next we have the Optimzied type-legalized DAG:

        t22: i8 = srl t19, Constant:i64<1>
      t43: i8 = setcc t22, Constant:i8<1>, setne:ch
    t31: ch = brcond t18, t43, BasicBlock:ch<if.else 0xa5f8d48>
  t33: ch = br t31, BasicBlock:ch<if.then 0xa5f8c98>

The *xor* has been replaced with a *setcc*. The legalized selection DAG is
essentially the same. As is the optimized legalized selection DAG.

So if t19 contains 0b00000110 then
t22 contains 0b00000011
setcc then compares t22 with a constant 1 and since they're not equal (setne)
it sets bit 0 of t43.
brcond will then test bit 0 of t43 and since it's set it branches to the
else branch (prints FAIL in this case)

If instead t22 contained 0b00000001 (as would be the case if the mask was
still there) the setcc would find both values to compare equal and since setne
is specified the branch in brcond will not be taken (the correct behavior)

Things seem to have gone wrong when the Type-legalized selection DAG was
optimized and the *xor *node was changed to a *setcc *(and actually, the
*xor* seems like it was more optimal than the *setcc *anyway)*. *

 Any ideas about why this is happening?

[in 3.6 we don't see this issue, but then again, in 3.6 the assembly is a
bit different: no srl is used to get at the foo field fo the struct]

Phil
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161221/04b91837/attachment.html>


More information about the llvm-dev mailing list