[llvm-dev] Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")
Mehdi Amini via llvm-dev
llvm-dev at lists.llvm.org
Wed Feb 24 22:28:24 PST 2016
> On Feb 24, 2016, at 10:20 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>
> ----- Original Message -----
>
>> From: "Mehdi Amini" <mehdi.amini at apple.com>
>> To: "Chandler Carruth" <chandlerc at google.com>
>> Cc: "Hal Finkel" <hfinkel at anl.gov>, "llvm-dev"
>> <llvm-dev at lists.llvm.org>
>> Sent: Thursday, February 25, 2016 12:02:16 AM
>> Subject: Re: [llvm-dev] Possible soundness issue with
>> available_externally (split from "RFC: Add guard intrinsics")
>
>>> On Feb 24, 2016, at 9:41 PM, Chandler Carruth via llvm-dev <
>>> llvm-dev at lists.llvm.org > wrote:
>>
>
>>> On Wed, Feb 24, 2016 at 9:35 PM Hal Finkel < hfinkel at anl.gov >
>>> wrote:
>>
>
>>>> ----- Original Message -----
>>>
>>
>
>>>>> From: "Chandler Carruth via llvm-dev" < llvm-dev at lists.llvm.org
>>>>>>
>>>
>>
>>>>> To: "Philip Reames" < listmail at philipreames.com >, "Duncan P.
>>>>> N.
>>>>> Exon
>>>
>>
>>>>> Smith" < dexonsmith at apple.com >, "Sanjoy Das"
>>>
>>
>>>>> < sanjoy at playingwithpointers.com >
>>>
>>
>>>>> Cc: "llvm-dev" < llvm-dev at lists.llvm.org >
>>>
>>
>>>>> Sent: Wednesday, February 24, 2016 10:29:23 PM
>>>
>>
>>>>> Subject: Re: [llvm-dev] Possible soundness issue with
>>>
>>
>>>>> available_externally (split from "RFC: Add guard intrinsics")
>>>
>>
>
>>>>> Yea, I'm pretty sad about all of this. I'm also not seeing a
>>>>> lot
>>>>> of
>>>
>>
>>>>> awesome paths forward.
>>>
>>
>
>>>>> Here is the least bad strategy I can come up with. Curious if
>>>>> folks
>>>
>>
>>>>> think this is sufficient:
>>>
>>
>
>>>> I may not completely understand the problem, but this seems like
>>>> overkill. The underlying restriction is that, if the compiler
>>>> makes
>>>> a non-determinism-collapsing choice when optimizing a function,
>>>> it
>>>> must make the same choice for all definitions of that function
>>>> (undefined behavior excluded).
>>>
>>
>>> This isn't enough, because some definition in some other module may
>>> *not be optimized at all*, and yet may get selected at link time.
>>
>
>>> Put another way, it must *prove* that the same choice will *always*
>>> be made for all definitions. This is akin to proving that the
>>> optimizer is run over all translation units for C++ linkonce_odr
>>> functions, which you can't do.
>>
>> Even if the optimizer is ran, it could take different decision
>> because the context would be different:
>
>> linkonce_odr foo() {
>> bar();
>> }
>
>> If bar() is present in the TU it can gets inlined into foo(). So the
>> optimizer would optimize differently foo().
>
>>> The result would be failing to optimize the bodies of linkonce_odr
>>> functions in any way which was externally detectable such as this.
>>> I
>>> think that would be *much* worse than losing the ability to do
>>> function attribute deduction for such functions?
>>
>
>>>> Thus, with an externally_available function, the CSE in Sanjoy's
>>>> original example should be forbidden. Richard's example again
>>>> demonstrates this principle, although in this case the
>>>> non-determinism is in the choice of a globally-visible
>>>> implementation technique rather than non-determinism from
>>>> memory-subsystem reordering.
>>>
>>
>
>>>> There is a complication, which you imply in your proposal, that
>>>> such
>>>> optimizations need to be forbidden not just in the
>>>> externally_available functions themselves, but in any local
>>>> function
>>>> transitively called by one. This, however, we can take care of
>>>> with
>>>> an (easily-deduced) attribute.
>>>
>>
>
>> I'm not sure why " such optimizations need to be forbidden [...] in
>> any local function transitively called by one", can you illustrate
>> with an example?
>
> Take Sanjoy's example with the two atomic loads, but outline the body of the function into some private function. The same restrictions need apply.
Do you mean like:
static void bar() {
%t0 = load atomic %ptr
%t1 = load atomic %ptr
if (%t0 != %t1) print("X");
}
void foo() available_externally {
bar();
}
void main() {
foo();
print("Y");
}
```
Sorry, I still fail to understand why you can't optimize bar(), including deducing attributes on it. Any reference to bar() is "final": bar() can't be interposed by the linker.
I think you should be able to infer attributes (and do IPA in general on any function that matches GlobalValue::isStrongDefinitionForLinker()).
--
Mehdi
>
> -Hal
>
>> --
>> Mehdi
>
>>>> In short, it is not clear to me that the number of problematic
>>>> optimizations is large (seems likely restricted to things
>>>> involving
>>>> atomics in practice),
>>>
>>
>>>> and while I understand the auditing difficulties here, we should
>>>> just
>>>> restrict these in appropriate contexts instead of trying to
>>>> restrict
>>>> all information flow into or out of comdats.
>>>
>>
>
>>>> -Hal
>>>
>>
>
>>>>> 1) Stop deducing function attributes within comdats by
>>>>> examining
>>>>> the
>>>
>>
>>>>> bodies of the functions (so that we remain free to transform
>>>>> the
>>>
>>
>>>>> bodies of functions).
>>>
>>
>>>>> 2) Teach frontends to emit (even at O0!!!) trivially deduced
>>>>> function
>>>
>>
>>>>> attributes for comdats so that we continue to catch easy cases.
>>>
>>
>>>>> 3) Ensure and specify that we never hoist code *into* a comdat
>>>>> group
>>>
>>
>>>>> in which it would not have been executed previously. I don't
>>>>> know
>>>>> of
>>>
>>
>>>>> anything in LLVM that does this today, but it would become an
>>>
>>
>>>>> important invariant.
>>>
>>
>>>>> 4) Work a lot harder to do internalizing and removing of this
>>>
>>
>>>>> restriction.
>>>
>>
>
>>>>> Pretty horrible. But I think it is correct.
>>>
>>
>
>>>>> As a slight modification to #1 and #2, we could have a very
>>>>> carefully
>>>
>>
>>>>> crafted deduction rule where we only deduce function attributes
>>>>> for
>>>
>>
>>>>> functions prior to any modification of their function bodies.
>>>>> Such
>>>
>>
>>>>> attributes should be conservatively correct because we would
>>>>> never
>>>
>>
>>>>> lift new code into the function bodies. This would at least
>>>>> allow
>>>>> us
>>>
>>
>>>>> to do bottom-up deduction to catch interprocedural cases. But
>>>>> it
>>>
>>
>>>>> would become incredibly subtle that this is only valid prior to
>>>
>>
>>>>> *any* transformations of the comdat-containing functions.
>>>
>>
>
>>>>> I'm starting to think this subtle rule might be worth it. But
>>>>> I'm
>>>
>>
>>>>> frankly terrified by the implications.
>>>
>>
>
>>>>> On Wed, Feb 24, 2016 at 8:13 PM Philip Reames via llvm-dev <
>>>
>>
>>>>> llvm-dev at lists.llvm.org > wrote:
>>>
>>
>
>>>>>> On 02/24/2016 08:10 PM, Duncan P. N. Exon Smith via llvm-dev
>>>>>> wrote:
>>>
>>
>>>>>
>>>
>>
>>>>>>>> On 2016-Feb-24, at 19:46, Sanjoy Das <
>>>
>>
>>>>>>>> sanjoy at playingwithpointers.com > wrote:
>>>
>>
>>>>>
>>>
>>
>>>>>>>>
>>>
>>
>>>>>
>>>
>>
>>>>>>>> On Wed, Feb 24, 2016 at 7:38 PM, Chandler Carruth <
>>>
>>
>>>>>>>> chandlerc at google.com > wrote:
>>>
>>
>>>>>
>>>
>>
>>>>>>>>> On Wed, Feb 24, 2016 at 7:34 PM Duncan P. N. Exon Smith
>>>
>>
>>>>>
>>>
>>
>>>>>>>>> < dexonsmith at apple.com > wrote:
>>>
>>
>>>>>
>>>
>>
>>>>>>>>>>
>>>
>>
>>>>>
>>>
>>
>>>>>>>>>>> On 2016-Feb-24, at 19:17, Chandler Carruth <
>>>
>>
>>>>>>>>>>> chandlerc at google.com > wrote:
>>>
>>
>>>>>
>>>
>>
>>>>>>>>>>>
>>>
>>
>>>>>
>>>
>>
>>>>>>>>>>> On Wed, Feb 24, 2016 at 7:10 PM Sanjoy Das via llvm-dev
>>>
>>
>>>>>
>>>
>>
>>>>>>>>>>> < llvm-dev at lists.llvm.org > wrote:
>>>
>>
>>>>>
>>>
>>
>>>>>>>>>>> On Wed, Feb 24, 2016 at 6:51 PM, Duncan P. N. Exon
>>>>>>>>>>> Smith
>>>
>>
>>>>>
>>>
>>
>>>>>>>>>>> < dexonsmith at apple.com > wrote:
>>>
>>
>>>>>
>>>
>>
>>>>>>>>>>>>> If we do not inline @foo(), and instead re-link the
>>>>>>>>>>>>> call
>>>
>>
>>>>>>>>>>>>> site
>>>
>>
>>>>>>>>>>>>> in
>>>
>>
>>>>>
>>>
>>
>>>>>>>>>>>>> @main
>>>
>>
>>>>>
>>>
>>
>>>>>>>>>>>>> to some non-optimized copy (or differently optimized
>>>>>>>>>>>>> copy)
>>>
>>
>>>>>>>>>>>>> of
>>>
>>
>>>>>>>>>>>>> @foo,
>>>
>>
>>>>>
>>>
>>
>>>>>>>>>>>>> then it is possible for the program to have the
>>>>>>>>>>>>> behavior
>>>
>>
>>>>>>>>>>>>> {print("Y");
>>>
>>
>>>>>
>>>
>>
>>>>>>>>>>>>> print ("X")}, which was disallowed in the earlier
>>>>>>>>>>>>> program.
>>>
>>
>>>>>
>>>
>>
>>>>>>>>>>>>>
>>>
>>
>>>>>
>>>
>>
>>>>>>>>>>>>> In other words, opt refined the semantics of @foo()
>>>>>>>>>>>>> (i.e.
>>>
>>
>>>>>>>>>>>>> reduced the
>>>
>>
>>>>>
>>>
>>
>>>>>>>>>>>>> set of behaviors it may have) in ways that would make
>>>>>>>>>>>>> later
>>>
>>
>>>>>
>>>
>>
>>>>>>>>>>>>> optimizations invalid if we de-refine the
>>>>>>>>>>>>> implementation
>>>>>>>>>>>>> of
>>>
>>
>>>>>>>>>>>>> @foo().
>>>
>>
>>>>>
>>>
>>
>>>>>>>>>>>> I'm probably missing something obvious here. How could
>>>>>>>>>>>> the
>>>
>>
>>>>>>>>>>>> result of
>>>
>>
>>>>>
>>>
>>
>>>>>>>>>>>> `%t0 != %t1` be different at optimization time in one
>>>>>>>>>>>> file
>>>
>>
>>>>>>>>>>>> than from
>>>
>>
>>>>>
>>>
>>
>>>>>>>>>>>> runtime in the "real" implementation? Doesn't this
>>>>>>>>>>>> make
>>>>>>>>>>>> the
>>>
>>
>>>>>>>>>>>> CSE
>>>
>>
>>>>>
>>>
>>
>>>>>>>>>>>> invalid?
>>>
>>
>>>>>
>>>
>>
>>>>>>>>>>> `%t0` and `%t1` are "allowed" to "always be the same",
>>>>>>>>>>> i.e.
>>>
>>
>>>>>>>>>>> an
>>>
>>
>>>>>
>>>
>>
>>>>>>>>>>> implementation of @foo that always feeds in the same
>>>
>>
>>>>>
>>>
>>
>>>>>>>>>>> value for `%t0` and `%t1` is a valid implementation
>>>>>>>>>>> (which
>>>>>>>>>>> is
>>>
>>
>>>>>>>>>>> why the
>>>
>>
>>>>>
>>>
>>
>>>>>>>>>>> CSE was valid); but it is not the *only* valid
>>>
>>
>>>>>>>>>>> implementation.
>>>
>>
>>>>>>>>>>> If I
>>>
>>
>>>>>
>>>
>>
>>>>>>>>>>> don't CSE the two load instructions (also a valid thing
>>>>>>>>>>> to
>>>
>>
>>>>>>>>>>> do),
>>>
>>
>>>>>>>>>>> and
>>>
>>
>>>>>
>>>
>>
>>>>>>>>>>> this is a second thread writing to `%par`, then the two
>>>
>>
>>>>>>>>>>> values
>>>
>>
>>>>>>>>>>> loaded
>>>
>>
>>>>>
>>>
>>
>>>>>>>>>>> can be different, and you could end up printing `"X"`
>>>>>>>>>>> in
>>>
>>
>>>>>>>>>>> `@foo`.
>>>
>>
>>>>>
>>>
>>
>>>>>>>>>>>
>>>
>>
>>>>>
>>>
>>
>>>>>>>>>>> Did that make sense?
>>>
>>
>>>>>
>>>
>>
>>>>>>>>>> Yes. To be sure I understand the scope: this is only a
>>>>>>>>>> problem
>>>
>>
>>>>>>>>>> for
>>>
>>
>>>>>
>>>
>>
>>>>>>>>>> atomics, correct? (Because multi-threaded behaviour with
>>>>>>>>>> other
>>>
>>
>>>>>>>>>> globals
>>>
>>
>>>>>
>>>
>>
>>>>>>>>>> is UB?)
>>>
>>
>>>>>
>>>
>>
>>>>>>>>>>
>>>
>>
>>>>>
>>>
>>
>>>>>>>>>>>> Does linkonce_odr linkage have the same problem?
>>>
>>
>>>>>
>>>
>>
>>>>>>>>>>>> - If so, do you want to change it too?
>>>
>>
>>>>>
>>>
>>
>>>>>>>>>>>> - Else, why not?
>>>
>>
>>>>>
>>>
>>
>>>>>>>>>>> Going by the specification in the LangRef, I'd say it
>>>>>>>>>>> depends
>>>
>>
>>>>>>>>>>> on how
>>>
>>
>>>>>
>>>
>>
>>>>>>>>>>> you define "definitive". If you're allowed to replace
>>>>>>>>>>> the
>>>
>>
>>>>>>>>>>> body
>>>
>>
>>>>>>>>>>> of a
>>>
>>
>>>>>
>>>
>>
>>>>>>>>>>> function with a differently optimized body, then the
>>>>>>>>>>> above
>>>
>>
>>>>>>>>>>> problem
>>>
>>
>>>>>
>>>
>>
>>>>>>>>>>> exists.
>>>
>>
>>>>>
>>>
>>
>>>>>>>>>>>
>>>
>>
>>>>>
>>>
>>
>>>>>>>>>>> I believe that is the case, and I strongly believe the
>>>
>>
>>>>>>>>>>> problem
>>>
>>
>>>>>>>>>>> you
>>>
>>
>>>>>
>>>
>>
>>>>>>>>>>> outline exists for linkonce_odr exactly as it does for
>>>
>>
>>>>>>>>>>> available_externally.
>>>
>>
>>>>>
>>>
>>
>>>>>>>>>>>
>>>
>>
>>>>>
>>>
>>
>>>>>>>>>>> Which is what makes this scary: every C++ inline
>>>>>>>>>>> function
>>>
>>
>>>>>>>>>>> today
>>>
>>
>>>>>>>>>>> can
>>>
>>
>>>>>
>>>
>>
>>>>>>>>>>> trigger this.
>>>
>>
>>>>>
>>>
>>
>>>>>>>>>> Every C/C++ inline or template function. But only the
>>>>>>>>>> ones
>>>
>>
>>>>>>>>>> that
>>>
>>
>>>>>>>>>> use
>>>
>>
>>>>>
>>>
>>
>>>>>>>>>> atomics, right?
>>>
>>
>>>>>
>>>
>>
>>>>>>>>>
>>>
>>
>>>>>
>>>
>>
>>>>>>>>> Well, with *this* example...
>>>
>>
>>>>>
>>>
>>
>>>>>>>> Atomic are one source of non-determinism that compilers
>>>>>>>> can
>>>
>>
>>>>>>>> reason
>>>
>>
>>>>>
>>>
>>
>>>>>>>> about. I don't know if the following snippet is well
>>>>>>>> defined
>>>>>>>> or
>>>
>>
>>>>>>>> not,
>>>
>>
>>>>>
>>>
>>
>>>>>>>> but you could have similar issues with
>>>
>>
>>>>>
>>>
>>
>>>>>>>>
>>>
>>
>>>>>
>>>
>>
>>>>>>>>
>>>
>>
>>>>>
>>>
>>
>>>>>>>> void foo() {
>>>
>>
>>>>>
>>>
>>
>>>>>>>> int *p = malloc(sizeof(int));
>>>
>>
>>>>>
>>>
>>
>>>>>>>> if (*p < 10) print("X");
>>>
>>
>>>>>
>>>
>>
>>>>>>>> }
>>>
>>
>>>>>
>>>
>>
>>>>>>>>
>>>
>>
>>>>>
>>>
>>
>>>>>>>> or (again, I don't know if this is actually well defined)
>>>
>>
>>>>>
>>>
>>
>>>>>>>>
>>>
>>
>>>>>
>>>
>>
>>>>>>>> void foo() {
>>>
>>
>>>>>
>>>
>>
>>>>>>>> int t; // it is probably reasonable to fold compares with
>>>
>>
>>>>>
>>>
>>
>>>>>>>> ptrtoint(alloca) to undef
>>>
>>
>>>>>
>>>
>>
>>>>>>>> if ((intptr_t)(&t) < 10) print("X");
>>>
>>
>>>>>
>>>
>>
>>>>>>>> }
>>>
>>
>>>>>
>>>
>>
>>>>>>>>
>>>
>>
>>>>>
>>>
>>
>>>>>>> The first one at least is UB, but as Richard pointed out
>>>>>>> the
>>>
>>
>>>>>>> scope
>>>
>>
>>>>>
>>>
>>
>>>>>>> is certainly broader than atomics (it's not even just
>>>
>>
>>>>>>> well-defined
>>>
>>
>>>>>
>>>
>>
>>>>>>> non-deterministism).
>>>
>>
>>>>>
>>>
>>
>>>>>>>
>>>
>>
>>>>>
>>>
>>
>>>>>>> I'm kind of terrified by the implications.
>>>
>>
>>>>>
>>>
>>
>>>>>> Me too. :(
>>>
>>
>>>>>
>>>
>>
>>>>>>>
>>>
>>
>>>>>
>>>
>>
>>>>>>>> -- Sanjoy
>>>
>>
>>>>>
>>>
>>
>>>>>>>>
>>>
>>
>>>>>
>>>
>>
>>>>>>>>>>
>>>
>>
>>>>>
>>>
>>
>>>>>>>>>> Not that I'm sure that will end up being a helpful
>>>
>>
>>>>>>>>>> distinction.
>>>
>>
>>>>>
>>>
>>
>>>>>>>>>
>>>
>>
>>>>>
>>>
>>
>>>>>>>>> Right. See Richard's comment. I think that sums up the
>>>>>>>>> real
>>>
>>
>>>>>>>>> issue
>>>
>>
>>>>>>>>> here. =/
>>>
>>
>>>>>
>>>
>>
>>>>>>> _______________________________________________
>>>
>>
>>>>>
>>>
>>
>>>>>>> LLVM Developers mailing list
>>>
>>
>>>>>
>>>
>>
>>>>>>> llvm-dev at lists.llvm.org
>>>
>>
>>>>>
>>>
>>
>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>
>>>>>
>>>
>>
>
>>>>>> _______________________________________________
>>>
>>
>>>>>
>>>
>>
>>>>>> LLVM Developers mailing list
>>>
>>
>>>>>
>>>
>>
>>>>>> llvm-dev at lists.llvm.org
>>>
>>
>>>>>
>>>
>>
>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>
>>>>>
>>>
>>
>
>>>>> _______________________________________________
>>>
>>
>>>>> LLVM Developers mailing list
>>>
>>
>>>>> llvm-dev at lists.llvm.org
>>>
>>
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>
>
>>>> --
>>>
>>
>
>>>> --
>>>
>>
>>>> Hal Finkel
>>>
>>
>>>> Assistant Computational Scientist
>>>
>>
>>>> Leadership Computing Facility
>>>
>>
>>>> Argonne National Laboratory
>>>
>>
>
>>> _______________________________________________
>>
>>> LLVM Developers mailing list
>>
>>> llvm-dev at lists.llvm.org
>>
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
> --
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
More information about the llvm-dev
mailing list