[LLVMbugs] [Bug 21617] New: wrong optimization of C++11 code on ARM and other targets

Thu Nov 20 08:09:39 PST 2014

http://llvm.org/bugs/show_bug.cgi?id=21617

            Bug ID: 21617
           Summary: wrong optimization of C++11 code on ARM and other
                    targets
           Product: tools
           Version: trunk
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P
         Component: llc
          Assignee: unassignedbugs at nondot.org
          Reporter: sohachak at mpi-sws.org
                CC: llvmbugs at cs.uiuc.edu
    Classification: Unclassified

Created attachment 13379
  --> http://llvm.org/bugs/attachment.cgi?id=13379&action=edit
cpp, IR, s files

Hi,

The following C++11 source code is compiled by LLVM for ARM architecture.

Source
----------
 int read() {
 int r0,r1,r2;
 r0 = x.load(memory_order_relaxed);
 r1 = y.load(memory_order_acquire);
 r2 = x.load(memory_order_relaxed);
return (r0+r1+r2);
}

Compilation command 
---------------------
clang++ -std=c++11 -emit-llvm -pthread <filename>.cpp -S;opt -O3 <filename>.ll
-S  > <filename>.opt.bc;llc -march=arm -O3 <filename>.opt.bc

Target
-------- 
@ BB#0:                                 @ %entry
        push    {r4, r5, r6, r11, lr}
        add     r11, sp, #12
        sub     sp, sp, #4
        ldr     r4, .LCPI0_0
        mov     r6, #0
        mov     r1, #0
        mov     r2, #0
        mov     r3, #0
        str     r6, [sp]
        mov     r0, r4
        bl      __sync_val_compare_and_swap_8
        mov     r5, r0
        ldr     r0, .LCPI0_1
        mov     r1, #0
        mov     r2, #0
        mov     r3, #0
        str     r6, [sp]
        bl      __sync_val_compare_and_swap_8
        add     r5, r0, r5
        mov     r0, r4
        mov     r1, #0
        mov     r2, #0
        mov     r3, #0
        str     r6, [sp]
        bl      __sync_val_compare_and_swap_8
        add     r0, r5, r0
        sub     sp, r11, #12
        pop     {r4, r5, r6, r11, lr}
        mov     pc, lr
        .align  2
@ BB#1:
.LCPI0_0:
        .long   x
.LCPI0_1:
        .long   y
.Ltmp0:
        .size   _Z4readv, .Ltmp0-_Z4readv
.Leh_func_end0:
        .fnend

The second load operation of x is removed in the compilation. 

This is a wrong compilation when following thread is running in parallel

void write() {
  x.store(500, memory_order_relaxed);
  y.store(10, memory_order_release);
}

In this source program following the C++11 semantics in read() function 
if r1 = 10 then r2 = 500 and thus read() should never return 10.
However, in the target program read() may return 10 when 
r1 = 0 /\ r1 = 10 /\ r2 = 0. This is a new behavior 
in the taget program which never happens in the source program. 

Note: 
1. Repeated relaxed load operation should not be removed if there is 
any acquire operation between them. 

2. The load removal is observed in llc IR dump after "Expand ISel
Pseudo-instructions".

3. Such load removal optimization is observed while 
generating code for following targets
arm, armeb, mips, mips64, mips64el, msp430, 
nvptx, nvptx64, ppc32, sparc, thumb, thumbeb, xcore.

Attached are the cpp, .s and LLVM IR files as a testcase.

Thanks & Regards,
soham

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20141120/c9ac8b02/attachment.html>