[LLVMdev] LICM promoting memory to scalar

Sun Sep 28 01:42:35 PDT 2014

----- Original Message -----
> From: "Balaram Makam" <bmakam at codeaurora.org>
> To: "Robin Morisset" <morisset at google.com>, "Filip Pizlo" <fpizlo at apple.com>
> Cc: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu>
> Sent: Wednesday, September 3, 2014 11:59:26 AM
> Subject: Re: [LLVMdev] LICM promoting memory to scalar
> 
> Thanks for the background on the concurrent memory model.
> 
> So, is it sufficient that the loop entry is guarded by condition (cbz
> at top) for preventing the race?
> The loop entry will be guarded by condition if loop has been rotated
> by loop rotate pass.
> 
> Since LICM runs after loop rotate, we can use
> ScalarEvolution::isLoopEntryGuardedByCond to check if we can
> speculatively execute load without causing a race.
> Is it heavy-handed solution for this problem? I can create a bug if
> you would like to.

If you did not file a bug report, please do.

 -Hal

> 
> Thanks,
> Balaram
> 
> From: Robin Morisset [mailto:morisset at google.com]
> Sent: Tuesday, September 02, 2014 7:40 PM
> To: Filip Pizlo
> Cc: Balaram Makam; LLVM Developers Mailing List
> Subject: Re: [LLVMdev] LICM promoting memory to scalar
> 
> Ah right, I had missed the cbz, my bad. It is indeed sound under that
> condition.
> My point was that just hoisting/sinking unconditionally the memory
> accesses is unsound. It is good news that gcc learnt how to do it
> carefully :)
> 
> On Tue, Sep 2, 2014 at 4:36 PM, Filip Pizlo <fpizlo at apple.com> wrote:
> I think gcc is right.
> 
> It inserted a branch for n == 0 (the cbz at the top), so that's not a
> problem.
> 
> In all other regards, this is safe: if you examine the sequence of
> loads and stores, it eliminated all but the first load and all but
> the last store. How's that unsafe?
> 
> If I had to guess, the bug here is that LLVM doesn't want to hoist
> the load over the condition (which it is right to want to avoid) but
> it fails to realize that the condition is true for the first
> iteration.
> 
> -Filip
> 
> On Sep 2, 2014, at 4:23 PM, Robin Morisset <morisset at google.com>
> wrote:
> The problem here is that doing this can introduce a race which is
> undefined at the IR level.
> In the example you gave above I suspect that this is a bug in GCC. I
> may have missed something in the assembly, but it appears that gcc
> loads a copy of globalvar in w3, does stuff with w3 and then stores
> w3 in globalvar. This is unsound in the case where the loop is run
> with n = 0.
> For an example, if you have a thread run foo(0,0) in a loop (so not
> doing anything) and another thread doing:
> for (int i = 0 ; i <1000000 ; ++i) {
>     globalvar = i;
>     assert(globalvar == i);
> }
> the code should never assert. But if you host the load/store out of
> the loop (as GCC appears to do), then occasionaly there will be
> something like
> thread 2: globalvar = i (= 42)
> thread 1: w3 = globalvar (= 42)
> thread 2: ++i
> thread 2: globalvar = i (= 43)
> thread 1: globalvar =w3 (= 42)
> thread 2: assert(globalvar == i)
> And it will assert (42 == 43) and crash.
> 
> Shorter answer:
> - speculatively executing stores is unsound because the value may
> have been modified behind your back by another thread.
> - speculatively executing loads might not be observable in some
> specific case but may always introduce races, thus introducing
> undefined behaviour and breaking assumptions that other passes may
> rely on.
> 
> Best regards,
> Robin
> 
> On Tue, Sep 2, 2014 at 3:46 PM, Balaram Makam <bmakam at codeaurora.org>
> wrote:
> All,
> 
> If we can speculatively execute a load instruction, why isn’t it safe
> to hoist it out by promoting it to a scalar in LICM pass?
> 
> 
> There is a comment in LICM pass that if a load/store is conditional
> then it is not safe because it would break the LLVM concurrency
> model (See commit 73bfa4a).
> It has an IR test for checking this in
> test/Transforms/LICM/scalar-promote-memmodel.ll
> 
> However, I have a sample code where GCC is able to promote the memory
> to scalar and hoist/sink load/store out of loop but LLVM cannot.
> Is GCC being aggressive here or LLVM missing out an opportunity?
> 
> Here is my sample code:
> 
> extern int globalvar;
> void foo(int n , int incr) {
>   unsigned int i;
>   for (i = 0 ; i < n; i += incr ) {
>     if (i < n/2)
>       globalvar += incr;
>   }
> return;
> }
> 
> GCC output:
> 
> $ aarch64-linux-gnu-g++ -S -o -  -O3  -ffast-math -march=armv8-a+simd
> test.cpp
>         .arch armv8-a+fp+simd
>         .file   "test.cpp"
>         .text
>         .align  2
>         .global _Z3fooii
>         .type   _Z3fooii, %function
> _Z3fooii:
> .LFB0:
>         .cfi_startproc
>         cbz     w0, .L1
>         adrp    x6, globalvar
>         add     w5, w0, w0, lsr 31
>         ldr     w3, [x6,#:lo12:globalvar]                        <==
>         hoist load of globalvar
>         mov     w2, 0
>         asr     w5, w5, 1
> .L4:
>         cmp     w5, w2
>         add     w2, w2, w1
>         add     w4, w3, w1
>         csel    w3, w4, w3, hi
>         cmp     w2, w0
>         bcc     .L4
>         str     w3, [x6,#:lo12:globalvar]                       <==
>         sink store of globalvar
> .L1:
>         ret
>         .cfi_endproc
> .LFE0:
>         .size   _Z3fooii, .-_Z3fooii
>         .ident  "GCC: (crosstool-NG linaro-1.13.1-4.8-2014.01 -
>         Linaro GCC 2013.11) 4.9.0"
> 
> LLVM output:
> 
> $ clang-aarch64-x++ -S -o - -O3 -ffast-math -fslp-vectorize test.cpp
>         .text
>         .file   "test.cpp"
>         .globl  _Z3fooii
>         .align  2
>         .type   _Z3fooii, at function
> _Z3fooii:                               // @_Z3fooii
> // BB#0:                                // %entry
>         cbz     w0, .LBB0_5
> // BB#1:                                // %for.body.lr.ph
>         mov      w8, wzr
>         cmp      w0, #0                 // =0
>         cinc     w9, w0, lt
>         asr     w9, w9, #1
>         adrp    x10, globalvar
> .LBB0_2:                                // %for.body
>                                         // =>This Inner Loop Header:
>                                         Depth=1
>         cmp      w8, w9
>         b.hs    .LBB0_4
> // BB#3:                                // %if.then
>                                         //   in Loop: Header=BB0_2
>                                         Depth=1
>         ldr     w11, [x10, :lo12:globalvar]
>                             <===== load inside loop
>         add      w11, w11, w1
>         str     w11, [x10, :lo12:globalvar]
>                              <==== store inside loop
> .LBB0_4:                                // %for.inc
>                                         //   in Loop: Header=BB0_2
>                                         Depth=1
>         add      w8, w8, w1
>         cmp      w8, w0
>         b.lo    .LBB0_2
> .LBB0_5:                                // %for.end
>         ret
> .Ltmp1:
>         .size   _Z3fooii, .Ltmp1-_Z3fooii
> 
> 
>         .ident  "clang version 3.6.0 "
> 
> Thanks,
> Balaram
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory