[PATCH] D70749: [InstCombine] do not insert nonnull assumption for undef

Wed Nov 27 13:33:11 PST 2019

jdoerfert added a comment.

In D70749#1761788 <https://reviews.llvm.org/D70749#1761788>, @inglorion wrote:

> > I doubt this is a "fix" but simply hides the problem one level deeper. Think of a non-undef value for which this happens but after one iteration of inlining, constant propagation, etc. it is replaced by undef, thus in the assumption.
>
> My take on this is that the transformation that generates a non-null assumption for a non-undef value is correct: if the value is live, it must be nonnull. As for later passes replacing the value with undef, my reading of the langref is that this would be incorrect. Specifically,
>  https://llvm.org/docs/LangRef.html#llvm-assume-intrinsic states "Note that the optimizer might limit the transformations performed on values used by the llvm.assume intrinsic in order to preserve the instructions only used to form the intrinsic’s input argument." which suggests to me the intent that values passed by llvm.assume be preserved. In other words, you can't just replace those by undef. So I don't think generating llvm.assumes for non-undef values creates a problem; simply not generating assumes for undefs should be enough.

I disagree with your interpretation of the lang ref here. Also, assuming it was correct, I'd argue that we are not, and we cannot, guarantee this anyway. There are way to many places where we perform folding of sorts that could move an existing undef to a new position.
My reading is: If you have an llvm.assume(%x), we might not perform a transformation involving %x, e.g. replacing %x, because of the assume. There is no guarantee whatsoever.

In D70749#1762138 <https://reviews.llvm.org/D70749#1762138>, @efriedma wrote:

> In D70749#1761120 <https://reviews.llvm.org/D70749#1761120>, @jdoerfert wrote:
>
> > I think the "solution" here is to weaken the guarantees of `nonnull`. That is, if the value is dead, it might be null after all.
>
>
> LLVM IR has no notion of "dead".  We could say that if the argument would be null, the behavior is defined, but the value is poison in the callee.  (Or something like that; not sure that phrasing properly captures the semantics.)

On first thought that sounds reasonable to me.

>> The only alternative I see is to make passes that replace dead values aware of attributes, e.g., let them clear them out. Though, I fear that also hides the problems only a few indirections deeper.
> 
> I don't think there's some deep class of problems hidden here beyond DeadArgElim itself.  The current rule is clear.  I would rather not change LangRef unless there's some interesting optimization that's blocked by the current semantics.

Every time we mix unused values with annotated information/attributes it is problematic. Take a function that returns a value but we know the return value is never inspected at any call site. We replace it with undef regardless of what the function and call site return arguments are.
Now we are looking for a mechanism to annotate information/attributes in the IR even if they are not arguments (see https://reviews.llvm.org/D37579#1761151) which will expose this mismatch not only for arguments and return values.

================
Comment at: llvm/test/Transforms/InstCombine/assume-replacing-call.ll:199
+entry:
+  %foo = call i64 @func_test_nonnull(i8* undef) #2
+  br i1 %c, label %ctrue, label %cend
----------------
rnk wrote:
> inglorion wrote:
> > efriedma wrote:
> > > Isn't this already UB?  From LangRef: "if the parameter [...] is null, the behavior is undefined".
> > That's basically how the optimization works. Behavior is undefined if the parameter is null, therefore we can assume it is not null, which unlock optimizations.
> > 
> > The problem is that we would also do this if the parameter is undef. It may not have been undef in the original program. In the example I was looking at, deadargelim replaced an unused (but declared non-null) parameter with undef, instcombine sank the call and generated a non-null assumption for the parameter (which became call void @llvm.assume(i1 undef)), and simplifycfg then eliminated the entire branch that code was on, leading to different behavior from the original program.
> > 
> > We could presumably fix this in other ways, but my thinking is that not generating assumptions for undef values is a sensible thing to do.
> How was DAE able to replace an argument with undef which was fed to strlen? Is that the real bug?
> 
> I agree with Eli, though, with current IR langref rules, it's valid to replace this call with `unreachable` as is. Undef can be anything, we can choose it to be zero, and that would become unreachable. If we actually materialized a zero along this codepath, it would crash strlen if it were executed in the original program.
One can build an example where there is an unused argument passed as reference for example (https://godbolt.org/z/DPjxuF).
And it is also not only nonnull (as I mentioned before).

I still believe we have to have "liveness" backed into attribute semantics or we will play whack-a-mole with optimizations that fold an new or existing undef into a position "it shouldn't be".

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70749/new/

https://reviews.llvm.org/D70749