<div dir="ltr">I do not know the loop infrastructure in LLVM well, so I cannot really give advice about how to proceed.<div>It would be fine in this case to do the optimization as long as the loop is entered, but the real condition is "would there be at least one load/store to this variable ?".</div>

<div>For example, if instead of "i < n/2" there was "i % 10 == 5" in your code, the optimization would have to be guarded by a check for n >= 6.</div><div><br></div><div>Robin</div></div><div class="gmail_extra">

<br><br><div class="gmail_quote">On Wed, Sep 3, 2014 at 9:59 AM, Balaram Makam <span dir="ltr"><<a href="mailto:bmakam@codeaurora.org" target="_blank">bmakam@codeaurora.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Thanks for the background on the concurrent memory model.<br>

<br>

So, is it sufficient that the loop entry is guarded by condition (cbz at<br>

top) for preventing the race?<br>

The loop entry will be guarded by condition if loop has been rotated by loop<br>

rotate pass.<br>

<br>

Since LICM runs after loop rotate, we can use<br>

ScalarEvolution::isLoopEntryGuardedByCond to check if we can speculatively<br>

execute load without causing a race.<br>

Is it heavy-handed solution for this problem? I can create a bug if you<br>

would like to.<br>

<br>

Thanks,<br>

Balaram<br>

<br>

From: Robin Morisset [mailto:<a href="mailto:morisset@google.com">morisset@google.com</a>]<br>

Sent: Tuesday, September 02, 2014 7:40 PM<br>

To: Filip Pizlo<br>

Cc: Balaram Makam; LLVM Developers Mailing List<br>

Subject: Re: [LLVMdev] LICM promoting memory to scalar<br>

<div class="HOEnZb"><div class="h5"><br>

Ah right, I had missed the cbz, my bad. It is indeed sound under that<br>

condition.<br>

My point was that just hoisting/sinking unconditionally the memory accesses<br>

is unsound. It is good news that gcc learnt how to do it carefully :)<br>

<br>

On Tue, Sep 2, 2014 at 4:36 PM, Filip Pizlo <<a href="mailto:fpizlo@apple.com">fpizlo@apple.com</a>> wrote:<br>

I think gcc is right. <br>

<br>

It inserted a branch for n == 0 (the cbz at the top), so that's not a<br>

problem. <br>

<br>

In all other regards, this is safe: if you examine the sequence of loads and<br>

stores, it eliminated all but the first load and all but the last store.<br>

How's that unsafe?<br>

<br>

If I had to guess, the bug here is that LLVM doesn't want to hoist the load<br>

over the condition (which it is right to want to avoid) but it fails to<br>

realize that the condition is true for the first iteration. <br>

<br>

-Filip<br>

<br>

On Sep 2, 2014, at 4:23 PM, Robin Morisset <<a href="mailto:morisset@google.com">morisset@google.com</a>> wrote:<br>

The problem here is that doing this can introduce a race which is undefined<br>

at the IR level.<br>

In the example you gave above I suspect that this is a bug in GCC. I may<br>

have missed something in the assembly, but it appears that gcc loads a copy<br>

of globalvar in w3, does stuff with w3 and then stores w3 in globalvar. This<br>

is unsound in the case where the loop is run with n = 0.<br>

For an example, if you have a thread run foo(0,0) in a loop (so not doing<br>

anything) and another thread doing:<br>

for (int i = 0 ; i <1000000 ; ++i) {<br>

    globalvar = i;<br>

    assert(globalvar == i);<br>

}<br>

the code should never assert. But if you host the load/store out of the loop<br>

(as GCC appears to do), then occasionaly there will be something like<br>

thread 2: globalvar = i (= 42)<br>

thread 1: w3 = globalvar (= 42)<br>

thread 2: ++i<br>

thread 2: globalvar = i (= 43)<br>

thread 1: globalvar =w3 (= 42)<br>

thread 2: assert(globalvar == i) <br>

And it will assert (42 == 43) and crash.<br>

<br>

Shorter answer:<br>

- speculatively executing stores is unsound because the value may have been<br>

modified behind your back by another thread.<br>

- speculatively executing loads might not be observable in some specific<br>

case but may always introduce races, thus introducing undefined behaviour<br>

and breaking assumptions that other passes may rely on.<br>

<br>

Best regards,<br>

Robin<br>

<br>

On Tue, Sep 2, 2014 at 3:46 PM, Balaram Makam <<a href="mailto:bmakam@codeaurora.org">bmakam@codeaurora.org</a>> wrote:<br>

All,<br>

<br>

If we can speculatively execute a load instruction, why isn’t it safe to<br>

hoist it out by promoting it to a scalar in LICM pass?<br>

<br>

<br>

There is a comment in LICM pass that if a load/store is conditional then it<br>

is not safe because it would break the LLVM concurrency model (See commit<br>

73bfa4a).<br>

It has an IR test for checking this in<br>

test/Transforms/LICM/scalar-promote-memmodel.ll<br>

<br>

However, I have a sample code where GCC is able to promote the memory to<br>

scalar and hoist/sink load/store out of loop but LLVM cannot.<br>

Is GCC being aggressive here or LLVM missing out an opportunity?<br>

<br>

Here is my sample code:<br>

<br>

extern int globalvar;<br>

void foo(int n , int incr) {<br>

  unsigned int i;<br>

  for (i = 0 ; i < n; i += incr ) {<br>

    if (i < n/2)<br>

      globalvar += incr;<br>

  }<br>

return;<br>

}<br>

<br>

GCC output:<br>

<br>

$ aarch64-linux-gnu-g++ -S -o -  -O3  -ffast-math -march=armv8-a+simd<br>

test.cpp<br>

        .arch armv8-a+fp+simd<br>

        .file   "test.cpp"<br>

        .text<br>

        .align  2<br>

        .global _Z3fooii<br>

        .type   _Z3fooii, %function<br>

_Z3fooii:<br>

.LFB0:<br>

        .cfi_startproc<br>

        cbz     w0, .L1<br>

        adrp    x6, globalvar<br>

        add     w5, w0, w0, lsr 31<br>

        ldr     w3, [x6,#:lo12:globalvar]                        <== hoist<br>

load of globalvar<br>

        mov     w2, 0<br>

        asr     w5, w5, 1<br>

.L4:<br>

        cmp     w5, w2<br>

        add     w2, w2, w1<br>

        add     w4, w3, w1<br>

        csel    w3, w4, w3, hi<br>

        cmp     w2, w0<br>

        bcc     .L4<br>

        str     w3, [x6,#:lo12:globalvar]                       <== sink<br>

store of globalvar<br>

.L1:<br>

        ret<br>

        .cfi_endproc<br>

.LFE0:<br>

        .size   _Z3fooii, .-_Z3fooii<br>

        .ident  "GCC: (crosstool-NG linaro-1.13.1-4.8-2014.01 - Linaro GCC<br>

2013.11) 4.9.0"<br>

<br>

LLVM output:<br>

<br>

$ clang-aarch64-x++ -S -o - -O3 -ffast-math -fslp-vectorize test.cpp<br>

        .text<br>

        .file   "test.cpp"<br>

        .globl  _Z3fooii<br>

        .align  2<br>

        .type   _Z3fooii,@function<br>

_Z3fooii:                               // @_Z3fooii<br>

// BB#0:                                // %entry<br>

        cbz     w0, .LBB0_5<br>

// BB#1:                                // %<a href="http://for.body.lr.ph" target="_blank">for.body.lr.ph</a><br>

        mov      w8, wzr<br>

        cmp      w0, #0                 // =0<br>

        cinc     w9, w0, lt<br>

        asr     w9, w9, #1<br>

        adrp    x10, globalvar<br>

.LBB0_2:                                // %for.body<br>

                                        // =>This Inner Loop Header: Depth=1<br>

        cmp      w8, w9<br>

        b.hs    .LBB0_4<br>

// BB#3:                                // %if.then<br>

                                        //   in Loop: Header=BB0_2 Depth=1<br>

        ldr     w11, [x10, :lo12:globalvar]                     <===== load<br>

inside loop<br>

        add      w11, w11, w1<br>

        str     w11, [x10, :lo12:globalvar]                      <==== store<br>

inside loop<br>

.LBB0_4:                                // %for.inc<br>

                                        //   in Loop: Header=BB0_2 Depth=1<br>

        add      w8, w8, w1<br>

        cmp      w8, w0<br>

        b.lo    .LBB0_2<br>

.LBB0_5:                                // %for.end<br>

        ret<br>

.Ltmp1:<br>

        .size   _Z3fooii, .Ltmp1-_Z3fooii<br>

<br>

<br>

        .ident  "clang version 3.6.0 "<br>

<br>

Thanks,<br>

Balaram<br>

<br>

<br>

_______________________________________________<br>

LLVM Developers mailing list<br>

<a href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a>         <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>

<br>

_______________________________________________<br>

LLVM Developers mailing list<br>

<a href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a>         <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>

<br>

</div></div></blockquote></div><br></div>