<div dir="ltr">I do not know the loop infrastructure in LLVM well, so I cannot really give advice about how to proceed.<div>It would be fine in this case to do the optimization as long as the loop is entered, but the real condition is "would there be at least one load/store to this variable ?".</div>
<div>For example, if instead of "i < n/2" there was "i % 10 == 5" in your code, the optimization would have to be guarded by a check for n >= 6.</div><div><br></div><div>Robin</div></div><div class="gmail_extra">
<br><br><div class="gmail_quote">On Wed, Sep 3, 2014 at 9:59 AM, Balaram Makam <span dir="ltr"><<a href="mailto:bmakam@codeaurora.org" target="_blank">bmakam@codeaurora.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Thanks for the background on the concurrent memory model.<br>
<br>
So, is it sufficient that the loop entry is guarded by condition (cbz at<br>
top) for preventing the race?<br>
The loop entry will be guarded by condition if loop has been rotated by loop<br>
rotate pass.<br>
<br>
Since LICM runs after loop rotate, we can use<br>
ScalarEvolution::isLoopEntryGuardedByCond to check if we can speculatively<br>
execute load without causing a race.<br>
Is it heavy-handed solution for this problem? I can create a bug if you<br>
would like to.<br>
<br>
Thanks,<br>
Balaram<br>
<br>
From: Robin Morisset [mailto:<a href="mailto:morisset@google.com">morisset@google.com</a>]<br>
Sent: Tuesday, September 02, 2014 7:40 PM<br>
To: Filip Pizlo<br>
Cc: Balaram Makam; LLVM Developers Mailing List<br>
Subject: Re: [LLVMdev] LICM promoting memory to scalar<br>
<div class="HOEnZb"><div class="h5"><br>
Ah right, I had missed the cbz, my bad. It is indeed sound under that<br>
condition.<br>
My point was that just hoisting/sinking unconditionally the memory accesses<br>
is unsound. It is good news that gcc learnt how to do it carefully :)<br>
<br>
On Tue, Sep 2, 2014 at 4:36 PM, Filip Pizlo <<a href="mailto:fpizlo@apple.com">fpizlo@apple.com</a>> wrote:<br>
I think gcc is right. <br>
<br>
It inserted a branch for n == 0 (the cbz at the top), so that's not a<br>
problem. <br>
<br>
In all other regards, this is safe: if you examine the sequence of loads and<br>
stores, it eliminated all but the first load and all but the last store.<br>
How's that unsafe?<br>
<br>
If I had to guess, the bug here is that LLVM doesn't want to hoist the load<br>
over the condition (which it is right to want to avoid) but it fails to<br>
realize that the condition is true for the first iteration. <br>
<br>
-Filip<br>
<br>
On Sep 2, 2014, at 4:23 PM, Robin Morisset <<a href="mailto:morisset@google.com">morisset@google.com</a>> wrote:<br>
The problem here is that doing this can introduce a race which is undefined<br>
at the IR level.<br>
In the example you gave above I suspect that this is a bug in GCC. I may<br>
have missed something in the assembly, but it appears that gcc loads a copy<br>
of globalvar in w3, does stuff with w3 and then stores w3 in globalvar. This<br>
is unsound in the case where the loop is run with n = 0.<br>
For an example, if you have a thread run foo(0,0) in a loop (so not doing<br>
anything) and another thread doing:<br>
for (int i = 0 ; i <1000000 ; ++i) {<br>
globalvar = i;<br>
assert(globalvar == i);<br>
}<br>
the code should never assert. But if you host the load/store out of the loop<br>
(as GCC appears to do), then occasionaly there will be something like<br>
thread 2: globalvar = i (= 42)<br>
thread 1: w3 = globalvar (= 42)<br>
thread 2: ++i<br>
thread 2: globalvar = i (= 43)<br>
thread 1: globalvar =w3 (= 42)<br>
thread 2: assert(globalvar == i) <br>
And it will assert (42 == 43) and crash.<br>
<br>
Shorter answer:<br>
- speculatively executing stores is unsound because the value may have been<br>
modified behind your back by another thread.<br>
- speculatively executing loads might not be observable in some specific<br>
case but may always introduce races, thus introducing undefined behaviour<br>
and breaking assumptions that other passes may rely on.<br>
<br>
Best regards,<br>
Robin<br>
<br>
On Tue, Sep 2, 2014 at 3:46 PM, Balaram Makam <<a href="mailto:bmakam@codeaurora.org">bmakam@codeaurora.org</a>> wrote:<br>
All,<br>
<br>
If we can speculatively execute a load instruction, why isn’t it safe to<br>
hoist it out by promoting it to a scalar in LICM pass?<br>
<br>
<br>
There is a comment in LICM pass that if a load/store is conditional then it<br>
is not safe because it would break the LLVM concurrency model (See commit<br>
73bfa4a).<br>
It has an IR test for checking this in<br>
test/Transforms/LICM/scalar-promote-memmodel.ll<br>
<br>
However, I have a sample code where GCC is able to promote the memory to<br>
scalar and hoist/sink load/store out of loop but LLVM cannot.<br>
Is GCC being aggressive here or LLVM missing out an opportunity?<br>
<br>
Here is my sample code:<br>
<br>
extern int globalvar;<br>
void foo(int n , int incr) {<br>
unsigned int i;<br>
for (i = 0 ; i < n; i += incr ) {<br>
if (i < n/2)<br>
globalvar += incr;<br>
}<br>
return;<br>
}<br>
<br>
GCC output:<br>
<br>
$ aarch64-linux-gnu-g++ -S -o - -O3 -ffast-math -march=armv8-a+simd<br>
test.cpp<br>
.arch armv8-a+fp+simd<br>
.file "test.cpp"<br>
.text<br>
.align 2<br>
.global _Z3fooii<br>
.type _Z3fooii, %function<br>
_Z3fooii:<br>
.LFB0:<br>
.cfi_startproc<br>
cbz w0, .L1<br>
adrp x6, globalvar<br>
add w5, w0, w0, lsr 31<br>
ldr w3, [x6,#:lo12:globalvar] <== hoist<br>
load of globalvar<br>
mov w2, 0<br>
asr w5, w5, 1<br>
.L4:<br>
cmp w5, w2<br>
add w2, w2, w1<br>
add w4, w3, w1<br>
csel w3, w4, w3, hi<br>
cmp w2, w0<br>
bcc .L4<br>
str w3, [x6,#:lo12:globalvar] <== sink<br>
store of globalvar<br>
.L1:<br>
ret<br>
.cfi_endproc<br>
.LFE0:<br>
.size _Z3fooii, .-_Z3fooii<br>
.ident "GCC: (crosstool-NG linaro-1.13.1-4.8-2014.01 - Linaro GCC<br>
2013.11) 4.9.0"<br>
<br>
LLVM output:<br>
<br>
$ clang-aarch64-x++ -S -o - -O3 -ffast-math -fslp-vectorize test.cpp<br>
.text<br>
.file "test.cpp"<br>
.globl _Z3fooii<br>
.align 2<br>
.type _Z3fooii,@function<br>
_Z3fooii: // @_Z3fooii<br>
// BB#0: // %entry<br>
cbz w0, .LBB0_5<br>
// BB#1: // %<a href="http://for.body.lr.ph" target="_blank">for.body.lr.ph</a><br>
mov w8, wzr<br>
cmp w0, #0 // =0<br>
cinc w9, w0, lt<br>
asr w9, w9, #1<br>
adrp x10, globalvar<br>
.LBB0_2: // %for.body<br>
// =>This Inner Loop Header: Depth=1<br>
cmp w8, w9<br>
b.hs .LBB0_4<br>
// BB#3: // %if.then<br>
// in Loop: Header=BB0_2 Depth=1<br>
ldr w11, [x10, :lo12:globalvar] <===== load<br>
inside loop<br>
add w11, w11, w1<br>
str w11, [x10, :lo12:globalvar] <==== store<br>
inside loop<br>
.LBB0_4: // %for.inc<br>
// in Loop: Header=BB0_2 Depth=1<br>
add w8, w8, w1<br>
cmp w8, w0<br>
b.lo .LBB0_2<br>
.LBB0_5: // %for.end<br>
ret<br>
.Ltmp1:<br>
.size _Z3fooii, .Ltmp1-_Z3fooii<br>
<br>
<br>
.ident "clang version 3.6.0 "<br>
<br>
Thanks,<br>
Balaram<br>
<br>
<br>
_______________________________________________<br>
LLVM Developers mailing list<br>
<a href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a> <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>
<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>
<br>
_______________________________________________<br>
LLVM Developers mailing list<br>
<a href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a> <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>
<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>
<br>
</div></div></blockquote></div><br></div>