[llvm-dev] Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")

Wed Feb 24 22:28:24 PST 2016

> On Feb 24, 2016, at 10:20 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> 
> ----- Original Message ----- 
> 
>> From: "Mehdi Amini" <mehdi.amini at apple.com>
>> To: "Chandler Carruth" <chandlerc at google.com>
>> Cc: "Hal Finkel" <hfinkel at anl.gov>, "llvm-dev"
>> <llvm-dev at lists.llvm.org>
>> Sent: Thursday, February 25, 2016 12:02:16 AM
>> Subject: Re: [llvm-dev] Possible soundness issue with
>> available_externally (split from "RFC: Add guard intrinsics")
> 
>>> On Feb 24, 2016, at 9:41 PM, Chandler Carruth via llvm-dev <
>>> llvm-dev at lists.llvm.org > wrote:
>> 
> 
>>> On Wed, Feb 24, 2016 at 9:35 PM Hal Finkel < hfinkel at anl.gov >
>>> wrote:
>> 
> 
>>>> ----- Original Message -----
>>> 
>> 
> 
>>>>> From: "Chandler Carruth via llvm-dev" < llvm-dev at lists.llvm.org
>>>>>> 
>>> 
>> 
>>>>> To: "Philip Reames" < listmail at philipreames.com >, "Duncan P.
>>>>> N.
>>>>> Exon
>>> 
>> 
>>>>> Smith" < dexonsmith at apple.com >, "Sanjoy Das"
>>> 
>> 
>>>>> < sanjoy at playingwithpointers.com >
>>> 
>> 
>>>>> Cc: "llvm-dev" < llvm-dev at lists.llvm.org >
>>> 
>> 
>>>>> Sent: Wednesday, February 24, 2016 10:29:23 PM
>>> 
>> 
>>>>> Subject: Re: [llvm-dev] Possible soundness issue with
>>> 
>> 
>>>>> available_externally (split from "RFC: Add guard intrinsics")
>>> 
>> 
> 
>>>>> Yea, I'm pretty sad about all of this. I'm also not seeing a
>>>>> lot
>>>>> of
>>> 
>> 
>>>>> awesome paths forward.
>>> 
>> 
> 
>>>>> Here is the least bad strategy I can come up with. Curious if
>>>>> folks
>>> 
>> 
>>>>> think this is sufficient:
>>> 
>> 
> 
>>>> I may not completely understand the problem, but this seems like
>>>> overkill. The underlying restriction is that, if the compiler
>>>> makes
>>>> a non-determinism-collapsing choice when optimizing a function,
>>>> it
>>>> must make the same choice for all definitions of that function
>>>> (undefined behavior excluded).
>>> 
>> 
>>> This isn't enough, because some definition in some other module may
>>> *not be optimized at all*, and yet may get selected at link time.
>> 
> 
>>> Put another way, it must *prove* that the same choice will *always*
>>> be made for all definitions. This is akin to proving that the
>>> optimizer is run over all translation units for C++ linkonce_odr
>>> functions, which you can't do.
>> 
>> Even if the optimizer is ran, it could take different decision
>> because the context would be different:
> 
>> linkonce_odr foo() {
>> bar();
>> }
> 
>> If bar() is present in the TU it can gets inlined into foo(). So the
>> optimizer would optimize differently foo().
> 
>>> The result would be failing to optimize the bodies of linkonce_odr
>>> functions in any way which was externally detectable such as this.
>>> I
>>> think that would be *much* worse than losing the ability to do
>>> function attribute deduction for such functions?
>> 
> 
>>>> Thus, with an externally_available function, the CSE in Sanjoy's
>>>> original example should be forbidden. Richard's example again
>>>> demonstrates this principle, although in this case the
>>>> non-determinism is in the choice of a globally-visible
>>>> implementation technique rather than non-determinism from
>>>> memory-subsystem reordering.
>>> 
>> 
> 
>>>> There is a complication, which you imply in your proposal, that
>>>> such
>>>> optimizations need to be forbidden not just in the
>>>> externally_available functions themselves, but in any local
>>>> function
>>>> transitively called by one. This, however, we can take care of
>>>> with
>>>> an (easily-deduced) attribute.
>>> 
>> 
> 
>> I'm not sure why " such optimizations need to be forbidden [...] in
>> any local function transitively called by one", can you illustrate
>> with an example?
> 
> Take Sanjoy's example with the two atomic loads, but outline the body of the function into some private function. The same restrictions need apply.

Do you mean like:

static void bar() {
 %t0 = load atomic %ptr
 %t1 = load atomic %ptr
 if (%t0 != %t1) print("X");
}
void foo() available_externally {
 bar();
}
void main() {
 foo();
 print("Y");
}
```

Sorry, I still fail to understand  why you can't optimize bar(), including deducing attributes on it. Any reference to bar() is "final": bar() can't be interposed by the linker. 
I think you should be able to infer attributes (and do IPA in general on any function that matches GlobalValue::isStrongDefinitionForLinker()).

-- 
Mehdi

> 
> -Hal
> 
>> --
>> Mehdi
> 
>>>> In short, it is not clear to me that the number of problematic
>>>> optimizations is large (seems likely restricted to things
>>>> involving
>>>> atomics in practice),
>>> 
>> 
>>>> and while I understand the auditing difficulties here, we should
>>>> just
>>>> restrict these in appropriate contexts instead of trying to
>>>> restrict
>>>> all information flow into or out of comdats.
>>> 
>> 
> 
>>>> -Hal
>>> 
>> 
> 
>>>>> 1) Stop deducing function attributes within comdats by
>>>>> examining
>>>>> the
>>> 
>> 
>>>>> bodies of the functions (so that we remain free to transform
>>>>> the
>>> 
>> 
>>>>> bodies of functions).
>>> 
>> 
>>>>> 2) Teach frontends to emit (even at O0!!!) trivially deduced
>>>>> function
>>> 
>> 
>>>>> attributes for comdats so that we continue to catch easy cases.
>>> 
>> 
>>>>> 3) Ensure and specify that we never hoist code *into* a comdat
>>>>> group
>>> 
>> 
>>>>> in which it would not have been executed previously. I don't
>>>>> know
>>>>> of
>>> 
>> 
>>>>> anything in LLVM that does this today, but it would become an
>>> 
>> 
>>>>> important invariant.
>>> 
>> 
>>>>> 4) Work a lot harder to do internalizing and removing of this
>>> 
>> 
>>>>> restriction.
>>> 
>> 
> 
>>>>> Pretty horrible. But I think it is correct.
>>> 
>> 
> 
>>>>> As a slight modification to #1 and #2, we could have a very
>>>>> carefully
>>> 
>> 
>>>>> crafted deduction rule where we only deduce function attributes
>>>>> for
>>> 
>> 
>>>>> functions prior to any modification of their function bodies.
>>>>> Such
>>> 
>> 
>>>>> attributes should be conservatively correct because we would
>>>>> never
>>> 
>> 
>>>>> lift new code into the function bodies. This would at least
>>>>> allow
>>>>> us
>>> 
>> 
>>>>> to do bottom-up deduction to catch interprocedural cases. But
>>>>> it
>>> 
>> 
>>>>> would become incredibly subtle that this is only valid prior to
>>> 
>> 
>>>>> *any* transformations of the comdat-containing functions.
>>> 
>> 
> 
>>>>> I'm starting to think this subtle rule might be worth it. But
>>>>> I'm
>>> 
>> 
>>>>> frankly terrified by the implications.
>>> 
>> 
> 
>>>>> On Wed, Feb 24, 2016 at 8:13 PM Philip Reames via llvm-dev <
>>> 
>> 
>>>>> llvm-dev at lists.llvm.org > wrote:
>>> 
>> 
> 
>>>>>> On 02/24/2016 08:10 PM, Duncan P. N. Exon Smith via llvm-dev
>>>>>> wrote:
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>> On 2016-Feb-24, at 19:46, Sanjoy Das <
>>> 
>> 
>>>>>>>> sanjoy at playingwithpointers.com > wrote:
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>> 
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>> On Wed, Feb 24, 2016 at 7:38 PM, Chandler Carruth <
>>> 
>> 
>>>>>>>> chandlerc at google.com > wrote:
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>> On Wed, Feb 24, 2016 at 7:34 PM Duncan P. N. Exon Smith
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>> < dexonsmith at apple.com > wrote:
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>>> 
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>>>> On 2016-Feb-24, at 19:17, Chandler Carruth <
>>> 
>> 
>>>>>>>>>>> chandlerc at google.com > wrote:
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>>>> 
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>>>> On Wed, Feb 24, 2016 at 7:10 PM Sanjoy Das via llvm-dev
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>>>> < llvm-dev at lists.llvm.org > wrote:
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>>>> On Wed, Feb 24, 2016 at 6:51 PM, Duncan P. N. Exon
>>>>>>>>>>> Smith
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>>>> < dexonsmith at apple.com > wrote:
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>>>>>> If we do not inline @foo(), and instead re-link the
>>>>>>>>>>>>> call
>>> 
>> 
>>>>>>>>>>>>> site
>>> 
>> 
>>>>>>>>>>>>> in
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>>>>>> @main
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>>>>>> to some non-optimized copy (or differently optimized
>>>>>>>>>>>>> copy)
>>> 
>> 
>>>>>>>>>>>>> of
>>> 
>> 
>>>>>>>>>>>>> @foo,
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>>>>>> then it is possible for the program to have the
>>>>>>>>>>>>> behavior
>>> 
>> 
>>>>>>>>>>>>> {print("Y");
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>>>>>> print ("X")}, which was disallowed in the earlier
>>>>>>>>>>>>> program.
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>>>>>> 
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>>>>>> In other words, opt refined the semantics of @foo()
>>>>>>>>>>>>> (i.e.
>>> 
>> 
>>>>>>>>>>>>> reduced the
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>>>>>> set of behaviors it may have) in ways that would make
>>>>>>>>>>>>> later
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>>>>>> optimizations invalid if we de-refine the
>>>>>>>>>>>>> implementation
>>>>>>>>>>>>> of
>>> 
>> 
>>>>>>>>>>>>> @foo().
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>>>>> I'm probably missing something obvious here. How could
>>>>>>>>>>>> the
>>> 
>> 
>>>>>>>>>>>> result of
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>>>>> `%t0 != %t1` be different at optimization time in one
>>>>>>>>>>>> file
>>> 
>> 
>>>>>>>>>>>> than from
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>>>>> runtime in the "real" implementation? Doesn't this
>>>>>>>>>>>> make
>>>>>>>>>>>> the
>>> 
>> 
>>>>>>>>>>>> CSE
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>>>>> invalid?
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>>>> `%t0` and `%t1` are "allowed" to "always be the same",
>>>>>>>>>>> i.e.
>>> 
>> 
>>>>>>>>>>> an
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>>>> implementation of @foo that always feeds in the same
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>>>> value for `%t0` and `%t1` is a valid implementation
>>>>>>>>>>> (which
>>>>>>>>>>> is
>>> 
>> 
>>>>>>>>>>> why the
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>>>> CSE was valid); but it is not the *only* valid
>>> 
>> 
>>>>>>>>>>> implementation.
>>> 
>> 
>>>>>>>>>>> If I
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>>>> don't CSE the two load instructions (also a valid thing
>>>>>>>>>>> to
>>> 
>> 
>>>>>>>>>>> do),
>>> 
>> 
>>>>>>>>>>> and
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>>>> this is a second thread writing to `%par`, then the two
>>> 
>> 
>>>>>>>>>>> values
>>> 
>> 
>>>>>>>>>>> loaded
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>>>> can be different, and you could end up printing `"X"`
>>>>>>>>>>> in
>>> 
>> 
>>>>>>>>>>> `@foo`.
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>>>> 
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>>>> Did that make sense?
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>>> Yes. To be sure I understand the scope: this is only a
>>>>>>>>>> problem
>>> 
>> 
>>>>>>>>>> for
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>>> atomics, correct? (Because multi-threaded behaviour with
>>>>>>>>>> other
>>> 
>> 
>>>>>>>>>> globals
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>>> is UB?)
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>>> 
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>>>>> Does linkonce_odr linkage have the same problem?
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>>>>> - If so, do you want to change it too?
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>>>>> - Else, why not?
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>>>> Going by the specification in the LangRef, I'd say it
>>>>>>>>>>> depends
>>> 
>> 
>>>>>>>>>>> on how
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>>>> you define "definitive". If you're allowed to replace
>>>>>>>>>>> the
>>> 
>> 
>>>>>>>>>>> body
>>> 
>> 
>>>>>>>>>>> of a
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>>>> function with a differently optimized body, then the
>>>>>>>>>>> above
>>> 
>> 
>>>>>>>>>>> problem
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>>>> exists.
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>>>> 
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>>>> I believe that is the case, and I strongly believe the
>>> 
>> 
>>>>>>>>>>> problem
>>> 
>> 
>>>>>>>>>>> you
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>>>> outline exists for linkonce_odr exactly as it does for
>>> 
>> 
>>>>>>>>>>> available_externally.
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>>>> 
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>>>> Which is what makes this scary: every C++ inline
>>>>>>>>>>> function
>>> 
>> 
>>>>>>>>>>> today
>>> 
>> 
>>>>>>>>>>> can
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>>>> trigger this.
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>>> Every C/C++ inline or template function. But only the
>>>>>>>>>> ones
>>> 
>> 
>>>>>>>>>> that
>>> 
>> 
>>>>>>>>>> use
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>>> atomics, right?
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>> 
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>> Well, with *this* example...
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>> Atomic are one source of non-determinism that compilers
>>>>>>>> can
>>> 
>> 
>>>>>>>> reason
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>> about. I don't know if the following snippet is well
>>>>>>>> defined
>>>>>>>> or
>>> 
>> 
>>>>>>>> not,
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>> but you could have similar issues with
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>> 
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>> 
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>> void foo() {
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>> int *p = malloc(sizeof(int));
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>> if (*p < 10) print("X");
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>> }
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>> 
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>> or (again, I don't know if this is actually well defined)
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>> 
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>> void foo() {
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>> int t; // it is probably reasonable to fold compares with
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>> ptrtoint(alloca) to undef
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>> if ((intptr_t)(&t) < 10) print("X");
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>> }
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>> 
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>> The first one at least is UB, but as Richard pointed out
>>>>>>> the
>>> 
>> 
>>>>>>> scope
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>> is certainly broader than atomics (it's not even just
>>> 
>> 
>>>>>>> well-defined
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>> non-deterministism).
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>> 
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>> I'm kind of terrified by the implications.
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>> Me too. :(
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>> 
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>> -- Sanjoy
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>> 
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>>> 
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>>> Not that I'm sure that will end up being a helpful
>>> 
>> 
>>>>>>>>>> distinction.
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>> 
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>>>> Right. See Richard's comment. I think that sums up the
>>>>>>>>> real
>>> 
>> 
>>>>>>>>> issue
>>> 
>> 
>>>>>>>>> here. =/
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>> _______________________________________________
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>> LLVM Developers mailing list
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>> llvm-dev at lists.llvm.org
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> 
>> 
>>>>> 
>>> 
>> 
> 
>>>>>> _______________________________________________
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>> LLVM Developers mailing list
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>> llvm-dev at lists.llvm.org
>>> 
>> 
>>>>> 
>>> 
>> 
>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> 
>> 
>>>>> 
>>> 
>> 
> 
>>>>> _______________________________________________
>>> 
>> 
>>>>> LLVM Developers mailing list
>>> 
>> 
>>>>> llvm-dev at lists.llvm.org
>>> 
>> 
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> 
>> 
> 
>>>> --
>>> 
>> 
> 
>>>> --
>>> 
>> 
>>>> Hal Finkel
>>> 
>> 
>>>> Assistant Computational Scientist
>>> 
>> 
>>>> Leadership Computing Facility
>>> 
>> 
>>>> Argonne National Laboratory
>>> 
>> 
> 
>>> _______________________________________________
>> 
>>> LLVM Developers mailing list
>> 
>>> llvm-dev at lists.llvm.org
>> 
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> 
> -- 
> 
> -- 
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory