[LLVMbugs] [Bug 8171] New: Illegal bit field manipulation code generated for big endian architectures

Fri Sep 17 08:22:16 PDT 2010

http://llvm.org/bugs/show_bug.cgi?id=8171

           Summary: Illegal bit field manipulation code generated for big
                    endian architectures
           Product: clang
           Version: 2.7
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P
         Component: -New Bugs
        AssignedTo: unassignedclangbugs at nondot.org
        ReportedBy: pekka.jaaskelainen at tut.fi
                CC: llvmbugs at cs.uiuc.edu

Created an attachment (id=5502)
 --> (http://llvm.org/bugs/attachment.cgi?id=5502)
x86 clang result

This simple C code using bitfields gets wrong bit level manipulation code
generated by Clang for our big endian architecture (TCE) while llvm-gcc
generates working code:

#include <stdio.h>

struct {
    unsigned short final  :  4; // 1
    unsigned short hlen   :  4; // 3
    unsigned short x      :  4;
    unsigned short y      :  4;
} val = {1, 2, 3, 4};

int main()
{

    printf("INITIALIZED: val.final = %u, val.hlen = %u, val.x = %u,
val.y=%u\n",
           val.final, val.hlen, val.x, val.y);
    val.final = 1;
    val.hlen = 2;
    val.x = 3;
    val.y = 4;
    printf("ASSIGNED: val.final = %u, val.hlen = %u, val.x = %u, val.y=%u\n",
           val.final, val.hlen, val.x, val.y);
    return (0);
}

With llvm-gcc to our architecture (expected result):
INITIALIZED: val.final = 1, val.hlen = 2, val.x = 3, val.y=4
ASSIGNED: val.final = 1, val.hlen = 2, val.x = 3, val.y=4

Native x86 Clang generated code works OK too:
INITIALIZED: val.final = 1, val.hlen = 2, val.x = 3, val.y=4
ASSIGNED: val.final = 1, val.hlen = 2, val.x = 3, val.y=4

The result when compiling with Clang to our architecture:
INITIALIZED: val.final = 4, val.hlen = 3, val.x = 2, val.y=1
ASSIGNED: val.final = 1, val.hlen = 2, val.x = 3, val.y=4

What seems suspicious is that diffing the manipulation code generated for the
little endian x86 and our big endian TTA is identical, only the initialization
value differs:

- at val = global %struct.anon { i8 33, i8 67 }, align 2 ; <%struct.anon*>
[#uses=1]
+ at val = global %struct.anon { i8 18, i8 52 }, align 2 ; <%struct.anon*>
[#uses=1]

The code then loads the value as i16 and expects the bit field variables
(final, hlen, x, y) to be stored at 0..3, 4...7, 8...11, 12..15 for both
targets. gcc extracts the bitmask values correctly also for our big endian
target (as far as I understood): 4...7, 0...3, 

LLVM 2.7 disassemblies attached. The problem repeats with the latest revision
of the LLVM 2.8 branch also.

Testing with another big endian target, PowerPC64:

clang -O0 -ccc-host-triple powerpc64-foo-bar bug.c -emit-llvm -c -o bug.bc 

This also produces the (seemingly) erroneous code, the result is identical
except for the datalayout string.

-- 
Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.