[llvm-dev] Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")

Wed Feb 24 22:07:11 PST 2016

Sent from my iPhone

> On Feb 24, 2016, at 9:41 PM, Chandler Carruth via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
>> On Wed, Feb 24, 2016 at 9:35 PM Hal Finkel <hfinkel at anl.gov> wrote:
>> ----- Original Message -----
>> 
>> > From: "Chandler Carruth via llvm-dev" <llvm-dev at lists.llvm.org>
>> > To: "Philip Reames" <listmail at philipreames.com>, "Duncan P. N. Exon
>> > Smith" <dexonsmith at apple.com>, "Sanjoy Das"
>> > <sanjoy at playingwithpointers.com>
>> > Cc: "llvm-dev" <llvm-dev at lists.llvm.org>
>> > Sent: Wednesday, February 24, 2016 10:29:23 PM
>> > Subject: Re: [llvm-dev] Possible soundness issue with
>> > available_externally (split from "RFC: Add guard intrinsics")
>> 
>> > Yea, I'm pretty sad about all of this. I'm also not seeing a lot of
>> > awesome paths forward.
>> 
>> > Here is the least bad strategy I can come up with. Curious if folks
>> > think this is sufficient:
>> 
>> I may not completely understand the problem, but this seems like overkill. The underlying restriction is that, if the compiler makes a non-determinism-collapsing choice when optimizing a function, it must make the same choice for all definitions of that function (undefined behavior excluded).
> 
> This isn't enough, because some definition in some other module may *not be optimized at all*, and yet may get selected at link time.
> 
> Put another way, it must *prove* that the same choice will *always* be made for all definitions. This is akin to proving that the optimizer is run over all translation units for C++ linkonce_odr functions, which you can't do.
> 
> The result would be failing to optimize the bodies of linkonce_odr functions in any way which was externally detectable such as this. I think that would be *much* worse than losing the ability to do function attribute deduction for such functions?
> 
If you are prior to optimising the function in any way (which I think is a reasonable thing to require as you said before) then any attributes added there will at least be conservatively correct. That will still allow a bunch of optimisation, I hope!

So for the following function:

int foo(int *i) {
  if (*i == 1)
    return 0;
  return 1;
}

You can legitimately add readonly and propagate to callers that the returned value has known zero bits > 1.

And for this similar function:

int foo(int *i, bool b) {
  if (b) {
    *i = 1;
    return 0;
  }
  return 1;
}

You could legitimately optimise a call 'foo(p, false)' so that the call site propagates in the constants and knows that this call returns 1 for sure. 

But could you constant propagate (without inlining) in to the call and add readonly to the call instruction here? I think you can because you know that passing false for b means that the function will not take the path that loads memory.

Pete
>> Thus, with an externally_available function, the CSE in Sanjoy's original example should be forbidden. Richard's example again demonstrates this principle, although in this case the non-determinism is in the choice of a globally-visible implementation technique rather than non-determinism from memory-subsystem reordering.
>> 
>> There is a complication, which you imply in your proposal, that such optimizations need to be forbidden not just in the externally_available functions themselves, but in any local function transitively called by one. This, however, we can take care of with an (easily-deduced) attribute.
>> 
>> In short, it is not clear to me that the number of problematic optimizations is large (seems likely restricted to things involving atomics in practice), and while I understand the auditing difficulties here, we should just restrict these in appropriate contexts instead of trying to restrict all information flow into or out of comdats.
>> 
>>  -Hal
>> 
>> > 1) Stop deducing function attributes within comdats by examining the
>> > bodies of the functions (so that we remain free to transform the
>> > bodies of functions).
>> > 2) Teach frontends to emit (even at O0!!!) trivially deduced function
>> > attributes for comdats so that we continue to catch easy cases.
>> > 3) Ensure and specify that we never hoist code *into* a comdat group
>> > in which it would not have been executed previously. I don't know of
>> > anything in LLVM that does this today, but it would become an
>> > important invariant.
>> > 4) Work a lot harder to do internalizing and removing of this
>> > restriction.
>> 
>> > Pretty horrible. But I think it is correct.
>> 
>> > As a slight modification to #1 and #2, we could have a very carefully
>> > crafted deduction rule where we only deduce function attributes for
>> > functions prior to any modification of their function bodies. Such
>> > attributes should be conservatively correct because we would never
>> > lift new code into the function bodies. This would at least allow us
>> > to do bottom-up deduction to catch interprocedural cases. But it
>> > would become incredibly subtle that this is only valid prior to
>> > *any* transformations of the comdat-containing functions.
>> 
>> > I'm starting to think this subtle rule might be worth it. But I'm
>> > frankly terrified by the implications.
>> 
>> > On Wed, Feb 24, 2016 at 8:13 PM Philip Reames via llvm-dev <
>> > llvm-dev at lists.llvm.org > wrote:
>> 
>> > > On 02/24/2016 08:10 PM, Duncan P. N. Exon Smith via llvm-dev wrote:
>> >
>> > > >> On 2016-Feb-24, at 19:46, Sanjoy Das <
>> > > >> sanjoy at playingwithpointers.com > wrote:
>> >
>> > > >>
>> >
>> > > >> On Wed, Feb 24, 2016 at 7:38 PM, Chandler Carruth <
>> > > >> chandlerc at google.com > wrote:
>> >
>> > > >>> On Wed, Feb 24, 2016 at 7:34 PM Duncan P. N. Exon Smith
>> >
>> > > >>> < dexonsmith at apple.com > wrote:
>> >
>> > > >>>>
>> >
>> > > >>>>> On 2016-Feb-24, at 19:17, Chandler Carruth <
>> > > >>>>> chandlerc at google.com > wrote:
>> >
>> > > >>>>>
>> >
>> > > >>>>> On Wed, Feb 24, 2016 at 7:10 PM Sanjoy Das via llvm-dev
>> >
>> > > >>>>> < llvm-dev at lists.llvm.org > wrote:
>> >
>> > > >>>>> On Wed, Feb 24, 2016 at 6:51 PM, Duncan P. N. Exon Smith
>> >
>> > > >>>>> < dexonsmith at apple.com > wrote:
>> >
>> > > >>>>>>> If we do not inline @foo(), and instead re-link the call
>> > > >>>>>>> site
>> > > >>>>>>> in
>> >
>> > > >>>>>>> @main
>> >
>> > > >>>>>>> to some non-optimized copy (or differently optimized copy)
>> > > >>>>>>> of
>> > > >>>>>>> @foo,
>> >
>> > > >>>>>>> then it is possible for the program to have the behavior
>> > > >>>>>>> {print("Y");
>> >
>> > > >>>>>>> print ("X")}, which was disallowed in the earlier program.
>> >
>> > > >>>>>>>
>> >
>> > > >>>>>>> In other words, opt refined the semantics of @foo() (i.e.
>> > > >>>>>>> reduced the
>> >
>> > > >>>>>>> set of behaviors it may have) in ways that would make later
>> >
>> > > >>>>>>> optimizations invalid if we de-refine the implementation of
>> > > >>>>>>> @foo().
>> >
>> > > >>>>>> I'm probably missing something obvious here. How could the
>> > > >>>>>> result of
>> >
>> > > >>>>>> `%t0 != %t1` be different at optimization time in one file
>> > > >>>>>> than from
>> >
>> > > >>>>>> runtime in the "real" implementation? Doesn't this make the
>> > > >>>>>> CSE
>> >
>> > > >>>>>> invalid?
>> >
>> > > >>>>> `%t0` and `%t1` are "allowed" to "always be the same", i.e.
>> > > >>>>> an
>> >
>> > > >>>>> implementation of @foo that always feeds in the same
>> >
>> > > >>>>> value for `%t0` and `%t1` is a valid implementation (which is
>> > > >>>>> why the
>> >
>> > > >>>>> CSE was valid); but it is not the *only* valid
>> > > >>>>> implementation.
>> > > >>>>> If I
>> >
>> > > >>>>> don't CSE the two load instructions (also a valid thing to
>> > > >>>>> do),
>> > > >>>>> and
>> >
>> > > >>>>> this is a second thread writing to `%par`, then the two
>> > > >>>>> values
>> > > >>>>> loaded
>> >
>> > > >>>>> can be different, and you could end up printing `"X"` in
>> > > >>>>> `@foo`.
>> >
>> > > >>>>>
>> >
>> > > >>>>> Did that make sense?
>> >
>> > > >>>> Yes. To be sure I understand the scope: this is only a problem
>> > > >>>> for
>> >
>> > > >>>> atomics, correct? (Because multi-threaded behaviour with other
>> > > >>>> globals
>> >
>> > > >>>> is UB?)
>> >
>> > > >>>>
>> >
>> > > >>>>>> Does linkonce_odr linkage have the same problem?
>> >
>> > > >>>>>> - If so, do you want to change it too?
>> >
>> > > >>>>>> - Else, why not?
>> >
>> > > >>>>> Going by the specification in the LangRef, I'd say it depends
>> > > >>>>> on how
>> >
>> > > >>>>> you define "definitive". If you're allowed to replace the
>> > > >>>>> body
>> > > >>>>> of a
>> >
>> > > >>>>> function with a differently optimized body, then the above
>> > > >>>>> problem
>> >
>> > > >>>>> exists.
>> >
>> > > >>>>>
>> >
>> > > >>>>> I believe that is the case, and I strongly believe the
>> > > >>>>> problem
>> > > >>>>> you
>> >
>> > > >>>>> outline exists for linkonce_odr exactly as it does for
>> > > >>>>> available_externally.
>> >
>> > > >>>>>
>> >
>> > > >>>>> Which is what makes this scary: every C++ inline function
>> > > >>>>> today
>> > > >>>>> can
>> >
>> > > >>>>> trigger this.
>> >
>> > > >>>> Every C/C++ inline or template function. But only the ones
>> > > >>>> that
>> > > >>>> use
>> >
>> > > >>>> atomics, right?
>> >
>> > > >>>
>> >
>> > > >>> Well, with *this* example...
>> >
>> > > >> Atomic are one source of non-determinism that compilers can
>> > > >> reason
>> >
>> > > >> about. I don't know if the following snippet is well defined or
>> > > >> not,
>> >
>> > > >> but you could have similar issues with
>> >
>> > > >>
>> >
>> > > >>
>> >
>> > > >> void foo() {
>> >
>> > > >> int *p = malloc(sizeof(int));
>> >
>> > > >> if (*p < 10) print("X");
>> >
>> > > >> }
>> >
>> > > >>
>> >
>> > > >> or (again, I don't know if this is actually well defined)
>> >
>> > > >>
>> >
>> > > >> void foo() {
>> >
>> > > >> int t; // it is probably reasonable to fold compares with
>> >
>> > > >> ptrtoint(alloca) to undef
>> >
>> > > >> if ((intptr_t)(&t) < 10) print("X");
>> >
>> > > >> }
>> >
>> > > >>
>> >
>> > > > The first one at least is UB, but as Richard pointed out the
>> > > > scope
>> >
>> > > > is certainly broader than atomics (it's not even just
>> > > > well-defined
>> >
>> > > > non-deterministism).
>> >
>> > > >
>> >
>> > > > I'm kind of terrified by the implications.
>> >
>> > > Me too. :(
>> >
>> > > >
>> >
>> > > >> -- Sanjoy
>> >
>> > > >>
>> >
>> > > >>>>
>> >
>> > > >>>> Not that I'm sure that will end up being a helpful
>> > > >>>> distinction.
>> >
>> > > >>>
>> >
>> > > >>> Right. See Richard's comment. I think that sums up the real
>> > > >>> issue
>> > > >>> here. =/
>> >
>> > > > _______________________________________________
>> >
>> > > > LLVM Developers mailing list
>> >
>> > > > llvm-dev at lists.llvm.org
>> >
>> > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >
>> 
>> > > _______________________________________________
>> >
>> > > LLVM Developers mailing list
>> >
>> > > llvm-dev at lists.llvm.org
>> >
>> > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >
>> 
>> > _______________________________________________
>> > LLVM Developers mailing list
>> > llvm-dev at lists.llvm.org
>> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> 
>> --
>> 
>> --
>> Hal Finkel
>> Assistant Computational Scientist
>> Leadership Computing Facility
>> Argonne National Laboratory
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160224/d3b5f398/attachment.html>