[PATCH] D130224: [Clang][Attribute] Introduce maybe_undef attribute for function arguments which accepts undef values

Mon Jul 25 11:42:17 PDT 2022

aaron.ballman added a comment.

In D130224#3668267 <https://reviews.llvm.org/D130224#3668267>, @arsenm wrote:

> In D130224#3668243 <https://reviews.llvm.org/D130224#3668243>, @aaron.ballman wrote:
>
>> In D130224#3668240 <https://reviews.llvm.org/D130224#3668240>, @arsenm wrote:
>>
>>> In D130224#3668225 <https://reviews.llvm.org/D130224#3668225>, @aaron.ballman wrote:
>>>
>>>> It sounds like there's an extremely specific use case in mind for this attribute; is there a reason why this needs to be the user's problem instead of something the compiler can infer or handle on the backend?
>>>
>>> The intended user is a handful of header wrapper functions around intrinsics, not general users
>>
>> Great, then it sounds like we don't need to expose this as an attribute because we know which intrinsics need special behavior, so we can handle them specially at codegen time or within the backend?
>
> The problem is the wrappers. We originally talked about specifically handling the builtins, but that falls apart given the real users come from wrapper functions which look like ordinary code

I'm still not seeing the issue fully. My understanding of the situation (which may be wrong) is that Clang lowers to LLVM IR and adds `noundef` markings at call sites that this patch attempts to let the user then undo. However, why can Clang's CodeGen not notice the special builtin and walk up the call chain to mark all the wrapper functions to ensure everything is marked appropriately? There might be a small perf loss (if there are wrappers around wrappers around wrappers kind of situation), but this means we don't have to expose this problem to users.

In D130224#3672338 <https://reviews.llvm.org/D130224#3672338>, @nhaehnle wrote:

> In D130224#3668225 <https://reviews.llvm.org/D130224#3668225>, @aaron.ballman wrote:
>
>> I'm in C standards meetings this week and don't have a lot of ability to thoroughly review this yet, but the most immediate concern which springs to mind for me is that this is exposing LLVM implementation details to users. Users should not have to think about things in terms of LLVM IR markings like poison and undef, and I worry that this is an expert-only feature that will be easy to misuse and introduce security issues.
>
> Here's how I would tentatively describe the attribute in terms that mesh better with how I understand C and C++:
>
>> As an exception to the rule that loading from an unitialized variable is undefined behavior, if the loaded value is used immediately as an `__attribute__((maybe_undef))` argument in a function call, the loaded value is implementation-defined. It may vary between multiple runs of the program, and it may vary between multiple uses of the uninitialized variable.
>
> This requires no thinking about LLVM IR and undef/poison.

Thanks, this helps somewhat (though I would argue that the loaded value is *not* implementation-defined but is instead unspecified, because that matches the semantics you mention about not needing to pick the same loaded value every time). However, this design is basically asking the user to decide when something should or should not be undefined behavior, which means this attribute will almost certainly be abused under the guise of "this makes my program safer because it removes UB". I still think that a solution which doesn't require the user to make such dangerous decisions on their own is a better design if we can accomplish it.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130224/new/

https://reviews.llvm.org/D130224