[llvm-dev] The semantics of nonnull attribute
Johannes Doerfert via llvm-dev
llvm-dev at lists.llvm.org
Mon Feb 17 18:35:02 PST 2020
Hi Juneyoung,
thank you for starting this discussion again, the last one [0] did not
really end in a solution but the issue remains and will only become
worse, e.g., enable the Attributor and you might see all of the problems
below exposed in a single pass [1-4].
As I argued in [0], UB is problematic and I am strongly in favor of
poison. In some sense I see UB as a very "strong" guarantee and poison
as a fairly "week" one. We will have a hard time deriving the strong one
and it might cause us to break things along the way, I mean your
examples show that we already miscompile codes.
I agree that poison is on its own not "strong enough" for certain
optimization, as you pointed out below, but we could reasonably augment
it with another attribute, let's call it `used`, that basically bridges
the gap:
void @foo(nonnull);
void @bar(nonnull used);
foo(NULL) // <- argument is poison inside of @foo
bar(NULL) // <- UB at the call site of @bar
A value is `used` if it would inevitably cause UB if it was poison at
this program point. I hope this makes some sense if not I can explain in
more detail.
Btw. I think this also integrates well with other attribute problems we
had in the past, e.g., memcpy(NULL, NULL, 0), since the pointers in this
example were "not used" (in some naive implementation of memcpy at
least).
Maybe I have overlooked a problem with this solution so let me know what
you think.
Cheers,
Johannes
[0] https://reviews.llvm.org/D70749
[1] 1. below: https://godbolt.org/z/J4wFVw
[2] 2. below: https://godbolt.org/z/TAkSC6
[3] 3. below: https://godbolt.org/z/UqYi6S (doesn't work yet, will though)
[4] 4. below: https://godbolt.org/z/Qkkc_E
https://godbolt.org/z/uXAfp_
...
On 02/18, Juneyoung Lee via llvm-dev wrote:
> Hello all,
>
> LangRef says it is undefined behavior to pass null to a nonnull argument
> (`call f(nonnull null);`), but the semantics is too strong for a few
> existing optimizations.
> To support these, we can relax the semantics so `f(nonnull null)` is
> equivalent to `f(poison)`, but (A) it again blocks another set of
> optimizations, and (B) this makes the semantics of nonnull deviate from
> other attributes like dereferenceable.
> I found that there was a similar discussion about this issue in the past as
> well, but seems it is not settled yet.
> What should the semantics of nonnull be?
> I listed a few optimizations that are relevant with this issue.
>
>
> 1. Propagating nonnull attribute to callee's arg (
> https://godbolt.org/z/-cVsVP )
>
> g(i8* ptr) {
> f(nonnull ptr);
> }
> =>
> g(i8* nonnull ptr) {
> f(nonnull ptr);
> }
>
> This is correct if f(nonnull null) is UB. If ptr == null, f(nonnull null)
> should have raised UB, so ptr shouldn't be null.
> However, this transformation is incorrect if f(nonnull null) is equivalent
> to f(poison).
> If f was an empty function, f(nonnull null) never raises UB regardless of
> ptr. So we can't guarantee ptr != null at other uses of ptr.
For the given code snipped it is not even invalid to propagate nonnull
in the poison case but your point is not wrong, if we have f(poison) we
cannot derive anything from it if the argument might be unused.
> 2. InstCombine (https://godbolt.org/z/HDQ7rD ):
>
> %ptr_inb = gep inbounds %any_ptr, 1
> f(%ptr_inb)
> =>
> %ptr_inb = .. (identical)
> f(nonnull %ptr_inb)
>
> This optimization is incorrect if `f(nonnull null)` is UB. The reason is as
> follows.
> If `gep inbounds %any_ptr, 1` yields poison, the source is `f(poison)`
> whereas the optimized one is `f(nonnull poison)`.
> `f(nonnull poison)` should be UB because `f(nonnull null)` is UB. So, the
> transformation introduced UB.
> This optimization is correct if both `f(nonnull null)` and `f(nonnull
> poison)` are equivalent to `f(poison)`.
>
>
> 3. https://reviews.llvm.org/D69477
>
> f(nonnull ptr);
> use(ptr);
> =>
> llvm.assume(ptr != null);
> use(ptr);
> f(nonnull ptr);
>
> If f(nonnull null) is f(poison), this is incorrect. If ptr was null, the
> added llvm.assume(ptr != null) raises UB whereas the source may not raise
> UB at all. (e.g. assume that f() was an empty function)
> If f(nonnull null) is UB, this is correct.
>
>
> 4. Dead argument elimination (from https://reviews.llvm.org/D70749 )
>
> f(nonnull ptr); // f’s argument is dead
> =>
> f(nonnull undef);
>
> This is incorrect if f(nonnull null) is UB. To make this correct, nonnull
> should be dropped. This becomes harder to fix if nonnull was attached at
> the signature of a function (not at the callee site).
> It is correct if f(nonnull null) is f(poison).
>
> Actually the D70749's thread had an end-to-end miscompilation example due
> to the interaction between this DAE and the optimization 3 (insertion of
> llvm.assume).
>
> Thanks,
> Juneyoung Lee
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
--
Johannes Doerfert
Researcher
Argonne National Laboratory
Lemont, IL 60439, USA
jdoerfert at anl.gov
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200217/e4bba1d6/attachment.sig>
More information about the llvm-dev
mailing list