<div dir="ltr"><div class="gmail_quote"><div dir="ltr">On Wed, Feb 24, 2016 at 7:10 PM Sanjoy Das via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On Wed, Feb 24, 2016 at 6:51 PM, Duncan P. N. Exon Smith<br>

<<a href="mailto:dexonsmith@apple.com" target="_blank">dexonsmith@apple.com</a>> wrote:<br>

>> If we do not inline @foo(), and instead re-link the call site in @main<br>

>> to some non-optimized copy (or differently optimized copy) of @foo,<br>

>> then it is possible for the program to have the behavior {print("Y");<br>

>> print ("X")}, which was disallowed in the earlier program.<br>

>><br>

>> In other words, opt refined the semantics of @foo() (i.e. reduced the<br>

>> set of behaviors it may have) in ways that would make later<br>

>> optimizations invalid if we de-refine the implementation of @foo().<br>

><br>

> I'm probably missing something obvious here.  How could the result of<br>

> `%t0 != %t1` be different at optimization time in one file than from<br>

> runtime in the "real" implementation?  Doesn't this make the CSE<br>

> invalid?<br>

<br>

`%t0` and `%t1` are "allowed" to "always be the same", i.e. an<br>

implementation of @foo that always feeds in the same<br>

value for `%t0` and `%t1` is a valid implementation (which is why the<br>

CSE was valid); but it is not the *only* valid implementation.  If I<br>

don't CSE the two load instructions (also a valid thing to do), and<br>

this is a second thread writing to `%par`, then the two values loaded<br>

can be different, and you could end up printing `"X"` in `@foo`.<br>

<br>

Did that make sense?<br>

<br>

> Does linkonce_odr linkage have the same problem?<br>

> - If so, do you want to change it too?<br>

> - Else, why not?<br>

<br>

Going by the specification in the LangRef, I'd say it depends on how<br>

you define "definitive".  If you're allowed to replace the body of a<br>

function with a differently optimized body, then the above problem<br>

exists.<br></blockquote><div><br></div><div>I believe that is the case, and I strongly believe the problem you outline exists for linkonce_odr exactly as it does for available_externally.</div><div><br></div><div>Which is what makes this scary: every C++ inline function today can trigger this.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

>> The above example is clearly fabricated, but such cases can come up<br>

>> even if everything is optimized to the same level.  E.g. one of the<br>

>> atomic loads in the unrefined implementation of @foo() could have been<br>

>> hidden behind a function call, whose body existed in only one module.<br>

>> That module would then be able to refine @foo() to `ret void` but<br>

>> other modules won't.<br>

>><br>

>> The only solution I can think of is to redefine available_externally<br>

>> to mean "the only kind of IPO/IPA you can do over a call to this<br>

>> function is to inline it".  Redefining available_externally this way<br>

>> will also let us soundly use it to represent calls to functions that<br>

>> have guard intrinsics, since a failed guard intrinsic basically<br>

>> replaces the function with a "very de-refined" implementation (the<br>

>> interpreter).<br>

>><br>

>> What do you think?  I don't think implementing the above above will be<br>

>> very difficult, but needless to say, it will still be a fairly<br>

>> non-trivial semantic change (hence I'm not directly jumping to<br>

>> implementation).<br>

><br>

> This linkage is used in three places (that I know of) by clang:<br>

><br>

>   1. C-style `inline` functions.<br>

>   2. Functions defined in C++ template classes with external explicit<br>

>      instantiations, e.g. S::foo() in:<br>

><br>

>          template <class T> struct S { void foo() {} };<br>

>          void bar() { S<int>().foo(); }<br>

>          extern template struct S<int>;<br>

><br>

>   3. -flto=thin cross-module function importing.<br>

><br>

> (No comment on (1); its exact semantics are a little fuzzy to me.)<br>

> For (2) and (3), the current behaviour seems correct, and I'd be<br>

> hesitant to lose optimizing power.  (2) is under the "ODR" rule, and<br>

> I think we've been applying the same logic to (3).  Unless, are you<br>

> saying ODR isn't enough?<br>

<br>

By ODR, do you mean you only have one definition of the function in<br>

the whole link (i.e. across all modules you'll link together)?<br>

Then yes, ODR should be enough to avoid this.  But in any place where<br>

the linker sees two differently optimized definitions for a function<br>

and picks one as the definitive version all non-inlined calls link to,<br>

we have this problem.<br></blockquote><div><br></div><div>No, different levels of optimization must be allowed within ODR. So this is a problem within an ODR context.</div><div><br></div><div>(The term ODR applies to one *source* definition, not one optimized definition)</div></div></div>