[PATCH] D130224: [Clang][Attribute] Introduce maybe_undef attribute for function arguments which accepts undef values

Mon Jul 25 12:52:50 PDT 2022

aaron.ballman added a comment.

In D130224#3677144 <https://reviews.llvm.org/D130224#3677144>, @arsenm wrote:

> In D130224#3668243 <https://reviews.llvm.org/D130224#3668243>, @aaron.ballman wrote:
>
>> I'm still not seeing the issue fully. My understanding of the situation (which may be wrong) is that Clang lowers to LLVM IR and adds `noundef` markings at call sites that this patch attempts to let the user then undo. However, why can Clang's CodeGen not notice the special builtin and walk up the call chain to mark all the wrapper functions to ensure everything is marked appropriately? There might be a small perf loss (if there are wrappers around wrappers around wrappers kind of situation), but this means we don't have to expose this problem to users.
>
> This requires maintaining logic in clang to specially recognize these N wrapper functions, and there would be no indication in the source that this magic is occurring. It's less maintainable because it requires tight, magic coupling between these functions in the source and special handling in clang to be maintained if anything were to change

I was thinking of something more general than that where Clang only has to specially recognize the builtins and which arguments to the builtin should be tracked to make maybeundef when walking up the call chain. However, you're not wrong about this being a bit magical in terms of behavior. I don't know that we have anything else which behaves this way.

In D130224#3677167 <https://reviews.llvm.org/D130224#3677167>, @jdoerfert wrote:

>> I still think that a solution which doesn't require the user to make such dangerous decisions on their own is a better design if we can accomplish it.
>
> This is a valid opinion but I don't see how this matches what we do (or has been done) in the past. I mean, there are plenty of command line options and attributes that do exactly this, let the user "define", or better "refine", semantics.
> The entire "initialize with 0" stuff comes to mind, making new things UB with -ffast-math and friends, overloadable, etc.

Thanks, this is a good question to be asking!

We are adding attributes in a way which doesn't scale well; many attributes interact with one another and we have basically no test coverage for those situations and very little thought put into diagnostic behavior. Now that we've got over 400 attributes in the tree, I'm realizing I probably should have been saying "no" more to attributes that don't carry enough weight to be worth the maintenance burden, but oh well, I can't unring that bell. I have *no issue* with attributes that are going to be things we expect users to use themselves. Stuff like "initialize with 0", overloadable, etc are all things that enable users to do something they couldn't previously do. However, in recent history, we've been adding more attributes that are for compiler glue code rather than users. On the one hand, attributes are a great use for such a thing. On the other hand, there's no way to control who uses the attribute (unless we put in special logic to only allow it to be written in a system header, or something like that). So glue code attributes are a different "thing" to me than user-facing attributes, and I'm worried at how many we're adding without any real design thought put behind them other than "we need it to solve this one case". If we could come up with some tablegen marking to say "this attribute is for glue code use only" that we could use to generate diagnostics on "misuse", generate documentation differently from, etc, that would go a long ways towards alleviating the concerns about user confusion with glue code attributes and "real" attributes (for lack of a better term).

Of course, these are high-level worries and not necessarily specific to this review or anything I expect someone here to take action on. Take this as background information rather than a code owner mandate or anything like that. Specific to this review, my concern is that this is an attribute that controls undefined behavior in LLVM rather directly (as opposed to things like "initialize to 0" which impact UB, but aren't as tightly tied to the LLVM backend architecture). That's a worry for two reasons: the tight coupling to LLVM IR attribute semantics (basically, this means LLVM's "freeze" instruction can't change meaning  because it will potentially break user code) and misuse due to misunderstanding the purpose of the attribute ("this makes my code more secure!"). Special compiler magic alleviates both of those concerns because it means we aren't exposing the tight coupling and users aren't given a chance to misuse the attribute.

However, what I think I'm hearing from this thread is that there are alternative approaches that have been thought about but not tried, we're not certain how feasible those approaches are in practice, but we expect them to be materially worse than what's proposed here. So it's not "this was the path of least resistance", but "this is the correct design." Do others agree with that assessment?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130224/new/

https://reviews.llvm.org/D130224