[LLVMbugs] [Bug 8044] New: clang miscompiles inline asm (uses ecx for two different things)

bugzilla-daemon at llvm.org bugzilla-daemon at llvm.org
Tue Aug 31 17:17:37 PDT 2010


           Summary: clang miscompiles inline asm (uses ecx for two
                    different things)
           Product: clang
           Version: trunk
          Platform: PC
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P
         Component: C++
        AssignedTo: unassignedclangbugs at nondot.org
        ReportedBy: rsbultje at gmail.com
                CC: llvmbugs at cs.uiuc.edu, dgregor at apple.com


in FFmpeg r24986, clang miscompiles decode_cabac_mb_mvd() in
libavcodec/h264_cabac.c. The function looks like this:

static int decode_cabac_mb_mvd( H264Context *h, int ctxbase, int amvd, int
*mvda) {
    int mvd;

//    if(!get_cabac(&h->cabac, &h->cabac_state[ctxbase+(amvd>2)+(amvd>32)])){
        *mvda= 0;
        return 0;
[.. more, but cut ..]

get_cabac is either C or inline ASM. For clang, we're using inline ASM, where
the relevant part looks like this (full code available from libavcodec/cabac.h,
if you want to look at it). I left out BRANCHLESS_GET_CABAC_UPDATE for
readability, because it shouldn't be necessary:

[.. cut ..]
#define BRANCHLESS_GET_CABAC(ret, cabac, statep, low, lowword, range, tmp,
        "movzbl "statep"    , "ret"                                     \n\t"\
        "mov    "range"     , "tmp"                                     \n\t"\
        "and    $0xC0       , "range"                                   \n\t"\
        "movzbl "MANGLE(ff_h264_lps_range)"("ret", "range", 2), "range" \n\t"\
        "sub    "range"     , "tmp"                                     \n\t"\
        BRANCHLESS_GET_CABAC_UPDATE(ret, cabac, statep, low, lowword, range,
tmp, tmpbyte)\
        "movzbl " MANGLE(ff_h264_norm_shift) "("range"), %%ecx          \n\t"\
        "shl    %%cl        , "range"                                   \n\t"\
        "movzbl "MANGLE(ff_h264_mlps_state)"+128("ret"), "tmp"          \n\t"\
        "mov    "tmpbyte"   , "statep"                                  \n\t"\
        "shl    %%cl        , "low"                                     \n\t"\
        "test   "lowword"   , "lowword"                                 \n\t"\
        " jnz   1f                                                      \n\t"\
        "mov "BYTE"("cabac"), %%"REG_c"                                 \n\t"\
        "movzwl (%%"REG_c")     , "tmp"                                 \n\t"\
        "bswap  "tmp"                                                   \n\t"\
        "shr    $15         , "tmp"                                     \n\t"\
        "sub    $0xFFFF     , "tmp"                                     \n\t"\
        "add    $2          , %%"REG_c"                                 \n\t"\
        "mov    %%"REG_c"   , "BYTE    "("cabac")                       \n\t"\
        "lea    -1("low")   , %%ecx                                     \n\t"\
        "xor    "low"       , %%ecx                                     \n\t"\
        "shr    $15         , %%ecx                                     \n\t"\
        "movzbl " MANGLE(ff_h264_norm_shift) "(%%ecx), %%ecx            \n\t"\
        "neg    %%ecx                                                   \n\t"\
        "add    $7          , %%ecx                                     \n\t"\
        "shl    %%cl        , "tmp"                                     \n\t"\
        "add    "tmp"       , "low"                                     \n\t"\
        "1:                                                             \n\t"

    __asm__ volatile(
        "movl "RANGE    "(%2), %%esi            \n\t"
        "movl "LOW      "(%2), %%ebx            \n\t"
        BRANCHLESS_GET_CABAC("%0", "%2", "(%1)", "%%ebx", "%%bx", "%%esi",
"%%edx", "%%dl")
        "movl %%esi, "RANGE    "(%2)            \n\t"
        "movl %%ebx, "LOW      "(%2)            \n\t"

        :"r"(state), "r"(c)
        : "%"REG_c, "%ebx", "%edx", "%esi", "memory"
[.. cut ..]

REG_c (in the clobber list, bottom) is defined as "ecx" in our header files (or
rcx on x86-64, but this is x86-32). RANGE is "4" and LOW is "0" (these are
offsets in struct c). Clang generates this confusing code for

Dump of assembler code for function decode_cabac_mb_mvd:
0x0819bfb0 <decode_cabac_mb_mvd+0>:    push   %ebp
0x0819bfb1 <decode_cabac_mb_mvd+1>:    push   %ebx
0x0819bfb2 <decode_cabac_mb_mvd+2>:    push   %edi
0x0819bfb3 <decode_cabac_mb_mvd+3>:    push   %esi
0x0819bfb4 <decode_cabac_mb_mvd+4>:    sub    $0x1c,%esp
0x0819bfb7 <decode_cabac_mb_mvd+7>:    mov    %edx,0x10(%esp)
0x0819bfbb <decode_cabac_mb_mvd+11>:    mov    %ecx,0x18(%esp)
0x0819bfbf <decode_cabac_mb_mvd+15>:    mov    0x30(%esp),%eax
0x0819bfc3 <decode_cabac_mb_mvd+19>:    lea    -0x21(%eax),%esi
0x0819bfc6 <decode_cabac_mb_mvd+22>:    sar    $0x1f,%esi
0x0819bfc9 <decode_cabac_mb_mvd+25>:    add    %edx,%esi
0x0819bfcb <decode_cabac_mb_mvd+27>:    add    $0xfffffffd,%eax
0x0819bfce <decode_cabac_mb_mvd+30>:    sar    $0x1f,%eax
0x0819bfd1 <decode_cabac_mb_mvd+33>:    add    %esi,%eax
0x0819bfd3 <decode_cabac_mb_mvd+35>:    lea    0x23742(%ecx,%eax,1),%eax
0x0819bfda <decode_cabac_mb_mvd+42>:    mov    %eax,0x14(%esp)
0x0819bfde <decode_cabac_mb_mvd+46>:    lea    0x23710(%ecx),%ecx
0x0819bfe4 <decode_cabac_mb_mvd+52>:    mov    %ecx,0xc(%esp)
0x0819bfe8 <decode_cabac_mb_mvd+56>:    mov    %eax,%edi
0x0819bfea <decode_cabac_mb_mvd+58>:    mov    0x4(%ecx),%esi
0x0819bfed <decode_cabac_mb_mvd+61>:    mov    (%ecx),%ebx
0x0819bfef <decode_cabac_mb_mvd+63>:    movzbl (%edi),%eax
---Type <return> to continue, or q <return> to quit---
0x0819bff2 <decode_cabac_mb_mvd+66>:    mov    %esi,%edx
0x0819bff4 <decode_cabac_mb_mvd+68>:    and    $0xc0,%esi
0x0819bffa <decode_cabac_mb_mvd+74>:    movzbl 0x8b46190(%eax,%esi,2),%esi
0x0819c002 <decode_cabac_mb_mvd+82>:    sub    %esi,%edx
0x0819c004 <decode_cabac_mb_mvd+84>:    mov    %edx,%ecx
0x0819c006 <decode_cabac_mb_mvd+86>:    shl    $0x11,%edx
0x0819c009 <decode_cabac_mb_mvd+89>:    cmp    %ebx,%edx
0x0819c00b <decode_cabac_mb_mvd+91>:    cmova  %ecx,%esi
0x0819c00e <decode_cabac_mb_mvd+94>:    sbb    %ecx,%ecx
0x0819c010 <decode_cabac_mb_mvd+96>:    and    %ecx,%edx
0x0819c012 <decode_cabac_mb_mvd+98>:    sub    %edx,%ebx
0x0819c014 <decode_cabac_mb_mvd+100>:    xor    %ecx,%eax
0x0819c016 <decode_cabac_mb_mvd+102>:    movzbl 0x85ce0c0(%esi),%ecx
0x0819c01d <decode_cabac_mb_mvd+109>:    shl    %cl,%esi
0x0819c01f <decode_cabac_mb_mvd+111>:    movzbl 0x8b46010(%eax),%edx
0x0819c026 <decode_cabac_mb_mvd+118>:    mov    %dl,(%edi)
0x0819c028 <decode_cabac_mb_mvd+120>:    shl    %cl,%ebx
0x0819c02a <decode_cabac_mb_mvd+122>:    test   %bx,%bx
0x0819c02d <decode_cabac_mb_mvd+125>:    jne    0x819c05e
0x0819c02f <decode_cabac_mb_mvd+127>:    mov    0x10(%ecx),%ecx
0x0819c032 <decode_cabac_mb_mvd+130>:    movzwl (%ecx),%edx
0x0819c035 <decode_cabac_mb_mvd+133>:    bswap  %edx
---Type <return> to continue, or q <return> to quit---
0x0819c037 <decode_cabac_mb_mvd+135>:    shr    $0xf,%edx
0x0819c03a <decode_cabac_mb_mvd+138>:    sub    $0xffff,%edx
0x0819c040 <decode_cabac_mb_mvd+144>:    add    $0x2,%ecx
0x0819c043 <decode_cabac_mb_mvd+147>:    mov    %ecx,0x10(%ecx)
0x0819c046 <decode_cabac_mb_mvd+150>:    lea    -0x1(%ebx),%ecx
0x0819c049 <decode_cabac_mb_mvd+153>:    xor    %ebx,%ecx
0x0819c04b <decode_cabac_mb_mvd+155>:    shr    $0xf,%ecx
0x0819c04e <decode_cabac_mb_mvd+158>:    movzbl 0x85ce0c0(%ecx),%ecx
0x0819c055 <decode_cabac_mb_mvd+165>:    neg    %ecx
0x0819c057 <decode_cabac_mb_mvd+167>:    add    $0x7,%ecx
0x0819c05a <decode_cabac_mb_mvd+170>:    shl    %cl,%edx
0x0819c05c <decode_cabac_mb_mvd+172>:    add    %edx,%ebx
0x0819c05e <decode_cabac_mb_mvd+174>:    mov    %esi,0x4(%ecx)
0x0819c061 <decode_cabac_mb_mvd+177>:    mov    %ebx,(%ecx)
0x0819c063 <decode_cabac_mb_mvd+179>:    test   $0x1,%al
0x0819c065 <decode_cabac_mb_mvd+181>:    jne    0x819c07b
0x0819c067 <decode_cabac_mb_mvd+183>:    mov    0x34(%esp),%eax
0x0819c06b <decode_cabac_mb_mvd+187>:    movl   $0x0,(%eax)
0x0819c071 <decode_cabac_mb_mvd+193>:    xor    %eax,%eax
0x0819c073 <decode_cabac_mb_mvd+195>:    add    $0x1c,%esp
0x0819c076 <decode_cabac_mb_mvd+198>:    pop    %esi
0x0819c077 <decode_cabac_mb_mvd+199>:    pop    %edi
---Type <return> to continue, or q <return> to quit---
0x0819c078 <decode_cabac_mb_mvd+200>:    pop    %ebx
0x0819c079 <decode_cabac_mb_mvd+201>:    pop    %ebp
0x0819c07a <decode_cabac_mb_mvd+202>:    ret    
0x0819c07b <decode_cabac_mb_mvd+203>:    addl   $0x3,0x10(%esp)
0x0819c080 <decode_cabac_mb_mvd+208>:    movl   $0x1,0x14(%esp)
0x0819c088 <decode_cabac_mb_mvd+216>:    mov    0xc(%esp),%ebp
0x0819c08c <decode_cabac_mb_mvd+220>:    jmp    0x819c148
0x0819c091 <decode_cabac_mb_mvd+225>:    jmp    0x819c0a0
0x0819c093 <decode_cabac_mb_mvd+227>:    nop
0x0819c094 <decode_cabac_mb_mvd+228>:    nop
0x0819c095 <decode_cabac_mb_mvd+229>:    nop
[.. rest cut for readability ..]

The last pieces at +174/+177 are the writes to the c struct at offset 4 and
zero that I pointed out above at the end of get_cabac, and the lines after that
are the "*mvda = 0; return 0;" lines from decode_cabac_mb_mvd() right after the
inlined get_cabac call returns. However, if you look how it writes the values
into the "c" struct at +174/+177 (and earlier on reads from the "c" struct), it
uses ecx for this, even though ecx is part of the clobberlist and is used for
calculations in the middle of the inline asm. Needless to say, the writes crash
(the reads are likely invalid, but it does not yet crash):

#0  0x0819c05e in decode_cabac_mb_mvd (h=Unhandled dwarf expression opcode 0xfb
#1  0x08199887 in ff_h264_decode_mb_cabac (h=0xf7ddc020)
#2  0x08194148 in decode_slice (arg=0x0, avctx=<value optimized out>)
#3  0x0819405c in execute_decode_slices (h=0xf7ddc020, context_count=Unhandled
dwarf expression opcode 0xee
#4  0x0818ce9d in decode_nal_units (h=Asked for position 0 of stack, stack only
has 0 elements on it.
#5  0x08190c5f in decode_frame (avpkt=0x0, avctx=0x8b516a0, 
    data_size=0xffffc538, data=0xffffc468)
#6  0x082cc1ca in avcodec_decode_video2 (
    got_picture_ptr=<value optimized out>, avctx=<value optimized out>, 
    avpkt=<value optimized out>, picture=<value optimized out>, 
    avpkt=<value optimized out>, got_picture_ptr=<value optimized out>, 
    picture=<value optimized out>, avctx=<value optimized out>)
#7  0x080e2064 in av_find_stream_info (ic=0x8b50470)
#8  0x080537c4 in opt_input_file (
"/home/rbultje/fate/fate-suite/h264-conformance/Sharp_MP_PAFF_2.jvt") at
#9  0x08055ec7 in parse_options (
    parse_arg_function=0x804df50 <opt_output_file>, options=0x84cb510, 
    argv=0xffffdbd4, argc=6)
#10 0x0804ac5a in main (argv=0xffffdbd4, argc=6) at ffmpeg.c:4320

Looks like a compiler bug to me. Works fine with gcc on x86-32/x86-64, also
works fine with clang on x86-64.

Compile command for h264_cabac.c:
clang -I. -I"/home/rbultje/fate/src" -D_ISOC99_SOURCE -D_POSIX_C_SOURCE=200112
-std=c99 -pthread -g -Wdeclaration-after-statement -Wall -Wno-parentheses
-Wno-switch -Wdisabled-optimization -Wpointer-arith -Wredundant-decls
-Wno-pointer-sign -Wcast-qual -Wwrite-strings -Wtype-limits -Wundef
-Wmissing-prototypes -O3 -fomit-frame-pointer -fno-math-errno -fno-signed-zeros
-Qunused-arguments          -MMD -c -o libavcodec/h264_cabac.o

clang -v:
clang version 2.8 (trunk 112587)
Target: i386-unknown-linux-gnu
Thread model: posix

uname -a:
Linux vpn 2.6.32-5-amd64 #1 SMP Thu Aug 12 15:04:38 UTC 2010 x86_64 GNU/Linux
(it also fails on freeBSD, I haven't verified but I assume it's the same bug)

Let me know if you need more information.

Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

More information about the llvm-bugs mailing list