[PATCH] D70749: [InstCombine] do not insert nonnull assumption for undef

Thu Nov 28 11:59:21 PST 2019

jdoerfert added a comment.

In D70749#1762393 <https://reviews.llvm.org/D70749#1762393>, @rnk wrote:

> In D70749#1761120 <https://reviews.llvm.org/D70749#1761120>, @jdoerfert wrote:
>
> > The only alternative I see is to make passes that replace dead values aware of attributes, e.g., let them clear them out. Though, I fear that also hides the problems only a few indirections deeper.
>
>
> I guess this, then, would be the "simple" fix that will allow us to move forward without making LangRef changes. DAE is already pretty attribute aware. I think it only surveys functions with local linkage, so stripping nonnull from known-dead arguments would be pretty safe.

I'm not certain but I couldn't come up with a reason why not. Though we will need to modify more than DAE.

> Another fix would be to treat arguments marked nonnull as live, but that would effectively make all C++ `this` pointers immune to DAE, since they are all nonnull.

This might work, but it's also not only nonnull, I guess most argument attributes are affected.

In D70749#1762404 <https://reviews.llvm.org/D70749#1762404>, @efriedma wrote:

> In D70749#1762388 <https://reviews.llvm.org/D70749#1762388>, @jdoerfert wrote:
>
> > In D70749#1762374 <https://reviews.llvm.org/D70749#1762374>, @efriedma wrote:
> >
> > > In D70749#1762291 <https://reviews.llvm.org/D70749#1762291>, @jdoerfert wrote:
> > >
> > > > >> The only alternative I see is to make passes that replace dead values aware of attributes, e.g., let them clear them out. Though, I fear that also hides the problems only a few indirections deeper.
> > > > > 
> > > > > I don't think there's some deep class of problems hidden here beyond DeadArgElim itself.  The current rule is clear.  I would rather not change LangRef unless there's some interesting optimization that's blocked by the current semantics.
> > > >
> > > > Every time we mix unused values with annotated information/attributes it is problematic. Take a function that returns a value but we know the return value is never inspected at any call site. We replace it with undef regardless of what the function and call site return arguments are.
> > > >  Now we are looking for a mechanism to annotate information/attributes in the IR even if they are not arguments (see https://reviews.llvm.org/D37579#1761151) which will expose this mismatch not only for arguments and return values.
> > >
> > >
> > > For most optimizations, a use in a "call" or "ret" instruction is a real use, enough to force optimizations to assume that other code depends on the value.  There's only a narrow class of IPO optimizations (I think just DeadArgElim and IPSCCP) which prove an argument or return value is "dead" despite having a use in a call or ret instruction.
> > >
> > > There's a corresponding rule for "!nonnull", in LangRef: "If the value is null at runtime, the behavior is undefined."  Code that hoists load instructions will erase the metadata.  This works correctly, as far as I know.
> >
> >
> > We are consistent by calling everything quickly UB, does not mean that this is the right choice. I would expect a transformation in the code base that does not erase the metadata or a bug report that exposed such behavior (because I would not have known to do it right now). This is again one of those situations where we force people to be always aware of implicit interactions in order to produce correct transformations. Having it result in poison would help us here and in the situations I mentioned in my last comment. Could you summarize the drawbacks? (I guess we should move this to the list.)
>
>
>
>
> 1. Source languages generally don't have any notion of poison. If we relax a marking from the source code to mean poison, we're throwing away information.

I assume you mean by source language markings `attribute((nonnull))`, correct? Arguably, people struggle here with the source language semantic and assume poison semantic often enough that we actually have problems sometimes.
Take the `memcpy`/`memset` example again. There were `nonnull` markings we actually had to ignore during propagation (and I think later strip) because people passed `null` and assumed something like poison (=no access no harm).

The questions seems to be:
 Do we think we get enough optimizations from UB based on annotations of dead values worth having the discussion with users about these cases.

I am actively working on improvements to exploit UB (with the Attributor) but I fear once we replace the UB call in

  static void foo(int &) {}
  void bar() { foo(*nullptr); }

with an unreachable, we will get some angry emails.

> 2. For a long time, we didn't clearly specify the behavior at all in a lot of cases, which was a complete mess.  In the process of clarifying LangRef, I chose instant UB for everything, and nobody presented an argument against that at the time. See D49041 <https://reviews.llvm.org/D49041> and D47854 <https://reviews.llvm.org/D47854>.
> 
>   If there are compelling reasons to produce poison for some specific attributes/metadata, or add equivalent attributes/metadata that produce poison instead of instant UB, we can consider that. sure.

I try to give you reasons here, right? I argued the problem is information written down on things we later declare dead. As I said before, this already affects arguments and return values (both at the call site and function site). If we go ahead and write down more information (=attributes) for floating values, the problem will appear there as well.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70749/new/

https://reviews.llvm.org/D70749