[llvm-dev] Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")

Wed Feb 24 22:20:54 PST 2016

----- Original Message ----- 

> From: "Mehdi Amini" <mehdi.amini at apple.com>
> To: "Chandler Carruth" <chandlerc at google.com>
> Cc: "Hal Finkel" <hfinkel at anl.gov>, "llvm-dev"
> <llvm-dev at lists.llvm.org>
> Sent: Thursday, February 25, 2016 12:02:16 AM
> Subject: Re: [llvm-dev] Possible soundness issue with
> available_externally (split from "RFC: Add guard intrinsics")

> > On Feb 24, 2016, at 9:41 PM, Chandler Carruth via llvm-dev <
> > llvm-dev at lists.llvm.org > wrote:
> 

> > On Wed, Feb 24, 2016 at 9:35 PM Hal Finkel < hfinkel at anl.gov >
> > wrote:
> 

> > > ----- Original Message -----
> > 
> 

> > > > From: "Chandler Carruth via llvm-dev" < llvm-dev at lists.llvm.org
> > > > >
> > 
> 
> > > > To: "Philip Reames" < listmail at philipreames.com >, "Duncan P.
> > > > N.
> > > > Exon
> > 
> 
> > > > Smith" < dexonsmith at apple.com >, "Sanjoy Das"
> > 
> 
> > > > < sanjoy at playingwithpointers.com >
> > 
> 
> > > > Cc: "llvm-dev" < llvm-dev at lists.llvm.org >
> > 
> 
> > > > Sent: Wednesday, February 24, 2016 10:29:23 PM
> > 
> 
> > > > Subject: Re: [llvm-dev] Possible soundness issue with
> > 
> 
> > > > available_externally (split from "RFC: Add guard intrinsics")
> > 
> 

> > > > Yea, I'm pretty sad about all of this. I'm also not seeing a
> > > > lot
> > > > of
> > 
> 
> > > > awesome paths forward.
> > 
> 

> > > > Here is the least bad strategy I can come up with. Curious if
> > > > folks
> > 
> 
> > > > think this is sufficient:
> > 
> 

> > > I may not completely understand the problem, but this seems like
> > > overkill. The underlying restriction is that, if the compiler
> > > makes
> > > a non-determinism-collapsing choice when optimizing a function,
> > > it
> > > must make the same choice for all definitions of that function
> > > (undefined behavior excluded).
> > 
> 
> > This isn't enough, because some definition in some other module may
> > *not be optimized at all*, and yet may get selected at link time.
> 

> > Put another way, it must *prove* that the same choice will *always*
> > be made for all definitions. This is akin to proving that the
> > optimizer is run over all translation units for C++ linkonce_odr
> > functions, which you can't do.
> 
> Even if the optimizer is ran, it could take different decision
> because the context would be different:

> linkonce_odr foo() {
> bar();
> }

> If bar() is present in the TU it can gets inlined into foo(). So the
> optimizer would optimize differently foo().

> > The result would be failing to optimize the bodies of linkonce_odr
> > functions in any way which was externally detectable such as this.
> > I
> > think that would be *much* worse than losing the ability to do
> > function attribute deduction for such functions?
> 

> > > Thus, with an externally_available function, the CSE in Sanjoy's
> > > original example should be forbidden. Richard's example again
> > > demonstrates this principle, although in this case the
> > > non-determinism is in the choice of a globally-visible
> > > implementation technique rather than non-determinism from
> > > memory-subsystem reordering.
> > 
> 

> > > There is a complication, which you imply in your proposal, that
> > > such
> > > optimizations need to be forbidden not just in the
> > > externally_available functions themselves, but in any local
> > > function
> > > transitively called by one. This, however, we can take care of
> > > with
> > > an (easily-deduced) attribute.
> > 
> 

> I'm not sure why " such optimizations need to be forbidden [...] in
> any local function transitively called by one", can you illustrate
> with an example?

Take Sanjoy's example with the two atomic loads, but outline the body of the function into some private function. The same restrictions need apply.

 -Hal

> --
> Mehdi

> > > In short, it is not clear to me that the number of problematic
> > > optimizations is large (seems likely restricted to things
> > > involving
> > > atomics in practice),
> > 
> 
> > > and while I understand the auditing difficulties here, we should
> > > just
> > > restrict these in appropriate contexts instead of trying to
> > > restrict
> > > all information flow into or out of comdats.
> > 
> 

> > > -Hal
> > 
> 

> > > > 1) Stop deducing function attributes within comdats by
> > > > examining
> > > > the
> > 
> 
> > > > bodies of the functions (so that we remain free to transform
> > > > the
> > 
> 
> > > > bodies of functions).
> > 
> 
> > > > 2) Teach frontends to emit (even at O0!!!) trivially deduced
> > > > function
> > 
> 
> > > > attributes for comdats so that we continue to catch easy cases.
> > 
> 
> > > > 3) Ensure and specify that we never hoist code *into* a comdat
> > > > group
> > 
> 
> > > > in which it would not have been executed previously. I don't
> > > > know
> > > > of
> > 
> 
> > > > anything in LLVM that does this today, but it would become an
> > 
> 
> > > > important invariant.
> > 
> 
> > > > 4) Work a lot harder to do internalizing and removing of this
> > 
> 
> > > > restriction.
> > 
> 

> > > > Pretty horrible. But I think it is correct.
> > 
> 

> > > > As a slight modification to #1 and #2, we could have a very
> > > > carefully
> > 
> 
> > > > crafted deduction rule where we only deduce function attributes
> > > > for
> > 
> 
> > > > functions prior to any modification of their function bodies.
> > > > Such
> > 
> 
> > > > attributes should be conservatively correct because we would
> > > > never
> > 
> 
> > > > lift new code into the function bodies. This would at least
> > > > allow
> > > > us
> > 
> 
> > > > to do bottom-up deduction to catch interprocedural cases. But
> > > > it
> > 
> 
> > > > would become incredibly subtle that this is only valid prior to
> > 
> 
> > > > *any* transformations of the comdat-containing functions.
> > 
> 

> > > > I'm starting to think this subtle rule might be worth it. But
> > > > I'm
> > 
> 
> > > > frankly terrified by the implications.
> > 
> 

> > > > On Wed, Feb 24, 2016 at 8:13 PM Philip Reames via llvm-dev <
> > 
> 
> > > > llvm-dev at lists.llvm.org > wrote:
> > 
> 

> > > > > On 02/24/2016 08:10 PM, Duncan P. N. Exon Smith via llvm-dev
> > > > > wrote:
> > 
> 
> > > >
> > 
> 
> > > > > >> On 2016-Feb-24, at 19:46, Sanjoy Das <
> > 
> 
> > > > > >> sanjoy at playingwithpointers.com > wrote:
> > 
> 
> > > >
> > 
> 
> > > > > >>
> > 
> 
> > > >
> > 
> 
> > > > > >> On Wed, Feb 24, 2016 at 7:38 PM, Chandler Carruth <
> > 
> 
> > > > > >> chandlerc at google.com > wrote:
> > 
> 
> > > >
> > 
> 
> > > > > >>> On Wed, Feb 24, 2016 at 7:34 PM Duncan P. N. Exon Smith
> > 
> 
> > > >
> > 
> 
> > > > > >>> < dexonsmith at apple.com > wrote:
> > 
> 
> > > >
> > 
> 
> > > > > >>>>
> > 
> 
> > > >
> > 
> 
> > > > > >>>>> On 2016-Feb-24, at 19:17, Chandler Carruth <
> > 
> 
> > > > > >>>>> chandlerc at google.com > wrote:
> > 
> 
> > > >
> > 
> 
> > > > > >>>>>
> > 
> 
> > > >
> > 
> 
> > > > > >>>>> On Wed, Feb 24, 2016 at 7:10 PM Sanjoy Das via llvm-dev
> > 
> 
> > > >
> > 
> 
> > > > > >>>>> < llvm-dev at lists.llvm.org > wrote:
> > 
> 
> > > >
> > 
> 
> > > > > >>>>> On Wed, Feb 24, 2016 at 6:51 PM, Duncan P. N. Exon
> > > > > >>>>> Smith
> > 
> 
> > > >
> > 
> 
> > > > > >>>>> < dexonsmith at apple.com > wrote:
> > 
> 
> > > >
> > 
> 
> > > > > >>>>>>> If we do not inline @foo(), and instead re-link the
> > > > > >>>>>>> call
> > 
> 
> > > > > >>>>>>> site
> > 
> 
> > > > > >>>>>>> in
> > 
> 
> > > >
> > 
> 
> > > > > >>>>>>> @main
> > 
> 
> > > >
> > 
> 
> > > > > >>>>>>> to some non-optimized copy (or differently optimized
> > > > > >>>>>>> copy)
> > 
> 
> > > > > >>>>>>> of
> > 
> 
> > > > > >>>>>>> @foo,
> > 
> 
> > > >
> > 
> 
> > > > > >>>>>>> then it is possible for the program to have the
> > > > > >>>>>>> behavior
> > 
> 
> > > > > >>>>>>> {print("Y");
> > 
> 
> > > >
> > 
> 
> > > > > >>>>>>> print ("X")}, which was disallowed in the earlier
> > > > > >>>>>>> program.
> > 
> 
> > > >
> > 
> 
> > > > > >>>>>>>
> > 
> 
> > > >
> > 
> 
> > > > > >>>>>>> In other words, opt refined the semantics of @foo()
> > > > > >>>>>>> (i.e.
> > 
> 
> > > > > >>>>>>> reduced the
> > 
> 
> > > >
> > 
> 
> > > > > >>>>>>> set of behaviors it may have) in ways that would make
> > > > > >>>>>>> later
> > 
> 
> > > >
> > 
> 
> > > > > >>>>>>> optimizations invalid if we de-refine the
> > > > > >>>>>>> implementation
> > > > > >>>>>>> of
> > 
> 
> > > > > >>>>>>> @foo().
> > 
> 
> > > >
> > 
> 
> > > > > >>>>>> I'm probably missing something obvious here. How could
> > > > > >>>>>> the
> > 
> 
> > > > > >>>>>> result of
> > 
> 
> > > >
> > 
> 
> > > > > >>>>>> `%t0 != %t1` be different at optimization time in one
> > > > > >>>>>> file
> > 
> 
> > > > > >>>>>> than from
> > 
> 
> > > >
> > 
> 
> > > > > >>>>>> runtime in the "real" implementation? Doesn't this
> > > > > >>>>>> make
> > > > > >>>>>> the
> > 
> 
> > > > > >>>>>> CSE
> > 
> 
> > > >
> > 
> 
> > > > > >>>>>> invalid?
> > 
> 
> > > >
> > 
> 
> > > > > >>>>> `%t0` and `%t1` are "allowed" to "always be the same",
> > > > > >>>>> i.e.
> > 
> 
> > > > > >>>>> an
> > 
> 
> > > >
> > 
> 
> > > > > >>>>> implementation of @foo that always feeds in the same
> > 
> 
> > > >
> > 
> 
> > > > > >>>>> value for `%t0` and `%t1` is a valid implementation
> > > > > >>>>> (which
> > > > > >>>>> is
> > 
> 
> > > > > >>>>> why the
> > 
> 
> > > >
> > 
> 
> > > > > >>>>> CSE was valid); but it is not the *only* valid
> > 
> 
> > > > > >>>>> implementation.
> > 
> 
> > > > > >>>>> If I
> > 
> 
> > > >
> > 
> 
> > > > > >>>>> don't CSE the two load instructions (also a valid thing
> > > > > >>>>> to
> > 
> 
> > > > > >>>>> do),
> > 
> 
> > > > > >>>>> and
> > 
> 
> > > >
> > 
> 
> > > > > >>>>> this is a second thread writing to `%par`, then the two
> > 
> 
> > > > > >>>>> values
> > 
> 
> > > > > >>>>> loaded
> > 
> 
> > > >
> > 
> 
> > > > > >>>>> can be different, and you could end up printing `"X"`
> > > > > >>>>> in
> > 
> 
> > > > > >>>>> `@foo`.
> > 
> 
> > > >
> > 
> 
> > > > > >>>>>
> > 
> 
> > > >
> > 
> 
> > > > > >>>>> Did that make sense?
> > 
> 
> > > >
> > 
> 
> > > > > >>>> Yes. To be sure I understand the scope: this is only a
> > > > > >>>> problem
> > 
> 
> > > > > >>>> for
> > 
> 
> > > >
> > 
> 
> > > > > >>>> atomics, correct? (Because multi-threaded behaviour with
> > > > > >>>> other
> > 
> 
> > > > > >>>> globals
> > 
> 
> > > >
> > 
> 
> > > > > >>>> is UB?)
> > 
> 
> > > >
> > 
> 
> > > > > >>>>
> > 
> 
> > > >
> > 
> 
> > > > > >>>>>> Does linkonce_odr linkage have the same problem?
> > 
> 
> > > >
> > 
> 
> > > > > >>>>>> - If so, do you want to change it too?
> > 
> 
> > > >
> > 
> 
> > > > > >>>>>> - Else, why not?
> > 
> 
> > > >
> > 
> 
> > > > > >>>>> Going by the specification in the LangRef, I'd say it
> > > > > >>>>> depends
> > 
> 
> > > > > >>>>> on how
> > 
> 
> > > >
> > 
> 
> > > > > >>>>> you define "definitive". If you're allowed to replace
> > > > > >>>>> the
> > 
> 
> > > > > >>>>> body
> > 
> 
> > > > > >>>>> of a
> > 
> 
> > > >
> > 
> 
> > > > > >>>>> function with a differently optimized body, then the
> > > > > >>>>> above
> > 
> 
> > > > > >>>>> problem
> > 
> 
> > > >
> > 
> 
> > > > > >>>>> exists.
> > 
> 
> > > >
> > 
> 
> > > > > >>>>>
> > 
> 
> > > >
> > 
> 
> > > > > >>>>> I believe that is the case, and I strongly believe the
> > 
> 
> > > > > >>>>> problem
> > 
> 
> > > > > >>>>> you
> > 
> 
> > > >
> > 
> 
> > > > > >>>>> outline exists for linkonce_odr exactly as it does for
> > 
> 
> > > > > >>>>> available_externally.
> > 
> 
> > > >
> > 
> 
> > > > > >>>>>
> > 
> 
> > > >
> > 
> 
> > > > > >>>>> Which is what makes this scary: every C++ inline
> > > > > >>>>> function
> > 
> 
> > > > > >>>>> today
> > 
> 
> > > > > >>>>> can
> > 
> 
> > > >
> > 
> 
> > > > > >>>>> trigger this.
> > 
> 
> > > >
> > 
> 
> > > > > >>>> Every C/C++ inline or template function. But only the
> > > > > >>>> ones
> > 
> 
> > > > > >>>> that
> > 
> 
> > > > > >>>> use
> > 
> 
> > > >
> > 
> 
> > > > > >>>> atomics, right?
> > 
> 
> > > >
> > 
> 
> > > > > >>>
> > 
> 
> > > >
> > 
> 
> > > > > >>> Well, with *this* example...
> > 
> 
> > > >
> > 
> 
> > > > > >> Atomic are one source of non-determinism that compilers
> > > > > >> can
> > 
> 
> > > > > >> reason
> > 
> 
> > > >
> > 
> 
> > > > > >> about. I don't know if the following snippet is well
> > > > > >> defined
> > > > > >> or
> > 
> 
> > > > > >> not,
> > 
> 
> > > >
> > 
> 
> > > > > >> but you could have similar issues with
> > 
> 
> > > >
> > 
> 
> > > > > >>
> > 
> 
> > > >
> > 
> 
> > > > > >>
> > 
> 
> > > >
> > 
> 
> > > > > >> void foo() {
> > 
> 
> > > >
> > 
> 
> > > > > >> int *p = malloc(sizeof(int));
> > 
> 
> > > >
> > 
> 
> > > > > >> if (*p < 10) print("X");
> > 
> 
> > > >
> > 
> 
> > > > > >> }
> > 
> 
> > > >
> > 
> 
> > > > > >>
> > 
> 
> > > >
> > 
> 
> > > > > >> or (again, I don't know if this is actually well defined)
> > 
> 
> > > >
> > 
> 
> > > > > >>
> > 
> 
> > > >
> > 
> 
> > > > > >> void foo() {
> > 
> 
> > > >
> > 
> 
> > > > > >> int t; // it is probably reasonable to fold compares with
> > 
> 
> > > >
> > 
> 
> > > > > >> ptrtoint(alloca) to undef
> > 
> 
> > > >
> > 
> 
> > > > > >> if ((intptr_t)(&t) < 10) print("X");
> > 
> 
> > > >
> > 
> 
> > > > > >> }
> > 
> 
> > > >
> > 
> 
> > > > > >>
> > 
> 
> > > >
> > 
> 
> > > > > > The first one at least is UB, but as Richard pointed out
> > > > > > the
> > 
> 
> > > > > > scope
> > 
> 
> > > >
> > 
> 
> > > > > > is certainly broader than atomics (it's not even just
> > 
> 
> > > > > > well-defined
> > 
> 
> > > >
> > 
> 
> > > > > > non-deterministism).
> > 
> 
> > > >
> > 
> 
> > > > > >
> > 
> 
> > > >
> > 
> 
> > > > > > I'm kind of terrified by the implications.
> > 
> 
> > > >
> > 
> 
> > > > > Me too. :(
> > 
> 
> > > >
> > 
> 
> > > > > >
> > 
> 
> > > >
> > 
> 
> > > > > >> -- Sanjoy
> > 
> 
> > > >
> > 
> 
> > > > > >>
> > 
> 
> > > >
> > 
> 
> > > > > >>>>
> > 
> 
> > > >
> > 
> 
> > > > > >>>> Not that I'm sure that will end up being a helpful
> > 
> 
> > > > > >>>> distinction.
> > 
> 
> > > >
> > 
> 
> > > > > >>>
> > 
> 
> > > >
> > 
> 
> > > > > >>> Right. See Richard's comment. I think that sums up the
> > > > > >>> real
> > 
> 
> > > > > >>> issue
> > 
> 
> > > > > >>> here. =/
> > 
> 
> > > >
> > 
> 
> > > > > > _______________________________________________
> > 
> 
> > > >
> > 
> 
> > > > > > LLVM Developers mailing list
> > 
> 
> > > >
> > 
> 
> > > > > > llvm-dev at lists.llvm.org
> > 
> 
> > > >
> > 
> 
> > > > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > 
> 
> > > >
> > 
> 

> > > > > _______________________________________________
> > 
> 
> > > >
> > 
> 
> > > > > LLVM Developers mailing list
> > 
> 
> > > >
> > 
> 
> > > > > llvm-dev at lists.llvm.org
> > 
> 
> > > >
> > 
> 
> > > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > 
> 
> > > >
> > 
> 

> > > > _______________________________________________
> > 
> 
> > > > LLVM Developers mailing list
> > 
> 
> > > > llvm-dev at lists.llvm.org
> > 
> 
> > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > 
> 

> > > --
> > 
> 

> > > --
> > 
> 
> > > Hal Finkel
> > 
> 
> > > Assistant Computational Scientist
> > 
> 
> > > Leadership Computing Facility
> > 
> 
> > > Argonne National Laboratory
> > 
> 

> > _______________________________________________
> 
> > LLVM Developers mailing list
> 
> > llvm-dev at lists.llvm.org
> 
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
-- 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory