[LLVMdev] LICM promoting memory to scalar

Tue Sep 2 16:23:49 PDT 2014

The problem here is that doing this can introduce a race which is undefined
at the IR level.
In the example you gave above I suspect that this is a bug in GCC. I may
have missed something in the assembly, but it appears that gcc loads a copy
of globalvar in w3, does stuff with w3 and then stores w3 in globalvar.
This is unsound in the case where the loop is run with n = 0.
For an example, if you have a thread run foo(0,0) in a loop (so not doing
anything) and another thread doing:
for (int i = 0 ; i <1000000 ; ++i) {
    globalvar = i;
    assert(globalvar == i);
}
the code should never assert. But if you host the load/store out of the
loop (as GCC appears to do), then occasionaly there will be something like
thread 2: globalvar = i (= 42)
thread 1: w3 = globalvar (= 42)
thread 2: ++i
thread 2: globalvar = i (= 43)
thread 1: globalvar =w3 (= 42)
thread 2: assert(globalvar == i)
And it will assert (42 == 43) and crash.

Shorter answer:
- speculatively executing stores is unsound because the value may have been
modified behind your back by another thread.
- speculatively executing loads might not be observable in some specific
case but may always introduce races, thus introducing undefined behaviour
and breaking assumptions that other passes may rely on.

Best regards,
Robin

On Tue, Sep 2, 2014 at 3:46 PM, Balaram Makam <bmakam at codeaurora.org> wrote:

> All,
>
> If we can speculatively execute a load instruction, why isn’t it safe to
> hoist it out by promoting it to a scalar in LICM pass?
>
>
> There is a comment in LICM pass that if a load/store is conditional then
> it is not safe because it would break the LLVM concurrency model (See
> commit 73bfa4a).
> It has an IR test for checking this in
> test/Transforms/LICM/scalar-promote-memmodel.ll
>
> However, I have a sample code where GCC is able to promote the memory to
> scalar and hoist/sink load/store out of loop but LLVM cannot.
> Is GCC being aggressive here or LLVM missing out an opportunity?
>
> Here is my sample code:
>
> extern int globalvar;
> void foo(int n , int incr) {
>   unsigned int i;
>   for (i = 0 ; i < n; i += incr ) {
>     if (i < n/2)
>       globalvar += incr;
>   }
> return;
> }
>
> GCC output:
>
> $ aarch64-linux-gnu-g++ -S -o -  -O3  -ffast-math -march=armv8-a+simd
> test.cpp
>         .arch armv8-a+fp+simd
>         .file   "test.cpp"
>         .text
>         .align  2
>         .global _Z3fooii
>         .type   _Z3fooii, %function
> _Z3fooii:
> .LFB0:
>         .cfi_startproc
>         cbz     w0, .L1
>         adrp    x6, globalvar
>         add     w5, w0, w0, lsr 31
>         ldr     w3, [x6,#:lo12:globalvar]                        <== hoist
> load of globalvar
>         mov     w2, 0
>         asr     w5, w5, 1
> .L4:
>         cmp     w5, w2
>         add     w2, w2, w1
>         add     w4, w3, w1
>         csel    w3, w4, w3, hi
>         cmp     w2, w0
>         bcc     .L4
>         str     w3, [x6,#:lo12:globalvar]                       <== sink
> store of globalvar
> .L1:
>         ret
>         .cfi_endproc
> .LFE0:
>         .size   _Z3fooii, .-_Z3fooii
>         .ident  "GCC: (crosstool-NG linaro-1.13.1-4.8-2014.01 - Linaro GCC
> 2013.11) 4.9.0"
>
> LLVM output:
>
> $ clang-aarch64-x++ -S -o - -O3 -ffast-math -fslp-vectorize test.cpp
>         .text
>         .file   "test.cpp"
>         .globl  _Z3fooii
>         .align  2
>         .type   _Z3fooii, at function
> _Z3fooii:                               // @_Z3fooii
> // BB#0:                                // %entry
>         cbz     w0, .LBB0_5
> // BB#1:                                // %for.body.lr.ph
>         mov      w8, wzr
>         cmp      w0, #0                 // =0
>         cinc     w9, w0, lt
>         asr     w9, w9, #1
>         adrp    x10, globalvar
> .LBB0_2:                                // %for.body
>                                         // =>This Inner Loop Header:
> Depth=1
>         cmp      w8, w9
>         b.hs    .LBB0_4
> // BB#3:                                // %if.then
>                                         //   in Loop: Header=BB0_2 Depth=1
>         ldr     w11, [x10, :lo12:globalvar]                     <=====
> load inside loop
>         add      w11, w11, w1
>         str     w11, [x10, :lo12:globalvar]                      <====
> store inside loop
> .LBB0_4:                                // %for.inc
>                                         //   in Loop: Header=BB0_2 Depth=1
>         add      w8, w8, w1
>         cmp      w8, w0
>         b.lo    .LBB0_2
> .LBB0_5:                                // %for.end
>         ret
> .Ltmp1:
>         .size   _Z3fooii, .Ltmp1-_Z3fooii
>
>
>         .ident  "clang version 3.6.0 "
>
> Thanks,
> Balaram
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140902/fee14609/attachment.html>