[PATCH] D130055: Clang extensions yolo, woot & kaboom

Thu Jul 28 04:29:10 PDT 2022

aaron.ballman added subscribers: aaronpuchert, delesley.
aaron.ballman added a comment.

In D130055#3683279 <https://reviews.llvm.org/D130055#3683279>, @beanz wrote:

> In D130055#3683173 <https://reviews.llvm.org/D130055#3683173>, @aaron.ballman wrote:
>
>> Are there circumstances where we cannot "simply" infer this from the constructor itself?
>
> I think this gets tricky in cases where objects may have valid "uninitialized" state. Like a tagged union that hasn't been set might set its tag value to "uninitialized" and zero its storage, which would make it look like all the members are initialized to the compiler, they just happen to be initialized to known sentinel values.

Ah, sentinel values in general all have that same general property, good point!

>> (After instantiation) we know the class hierarchy, we know the data members, and we know the ctor init list/in-class initializers, so it seems like we should be able to recursively mark a constructor as yolo for the user rather than making them write it themselves. It seems like inferring this would reduce false positives and false negatives (in the case the user marks the constructor incorrectly or the code gets out of sync with the marking), and we might only need this attribute in very rare circumstances (or not at all, as a user-facing attribute). Is that an approach you're considering?
>
> I hadn't thought at all about automated propagation.

My experiences with thread safety analysis annotations was that inference (or even a tool to automatically slap the right annotations explicitly on to declarations) was a huge usability win for users. This particular kind of diagnostic tends to be somewhat "viral" in that starting to use the annotations only helps if everything in some subset of your project is annotated. You can start from the leaf nodes in your types to get some benefit, but the most benefit tends to come from composition of these leaf types into more complex types. Reducing the number of annotations the user has to write makes this transition significantly easier if possible to accomplish with good fidelity.

> One of the other use cases I was thinking about was `std::optional`, where you could specify that initialization to `std::nullopt` is `yolo`, and then `get` becomes `kaboom`.
>
> With these annotations we can conservatively diagnose cases where an optional's value is accessed when uninitialized.
>
>> Fully? Partially? I'm trying to decide whether this is only useful for functions named `init()` and `clear()` and the like, or whether this is useful for functions like `setWhatever()` where you can have a partially constructed object that is not fully initialized by any one member function.
>
> I had also thought about this too. As implemented in this patch `woot` implies fully initialized, but there could be some interesting complex initializations. I could imagine builder patterns where given code like `Builder::Create().X().Y().Z()`, you might want `Y` to only be callable after `X` and `Z` might be the required final initializer.

I was thinking about our own AST nodes where, for example, you deserialize an AST by creating an empty node and then filling out its structure piecemeal. I think partial construction is a pretty important concept to consider early in the design given its prevalence as a reasonable implementation strategy.

>> Same question here about fully vs partially initialized. e.g., where you have a partially constructed object and you can only call `getWhatever()` after `setWhatever()` has been called.
>
> I'm wondering if there could be a generalization of the attribute like:
>
>   class TaggedValue {
>     enum Kind {
>       Uninitialized = 0,
>       Integer,
>       Float
>     };
>     Kind VK = Uninitialized;
>   
>     union {
>       int I;
>       float F;
>     };
>   public:
>     
>     [[clang::yolo]]
>     TaggedValue() = default;
>     
>     TaggedValue(TaggedValue&) = default;
>   
>     void hasValue() { return VK == Uninitialized; } // always safe
>   
>     [[clang::woot("FloatSet"]] // Marks as safe for functions with matching kaboom arguments
>     void set(float V) {
>       VK= Float;
>       F = V;
>     }
>   
>     [[clang::woot("IntSet")]] // Marks as safe for functions with matching kaboom arguments
>     void set(int V) {
>       VK= Integer;
>       I = V;
>     }
>   
>       [[clang::woot]] // Marks as safe for all kaboom functions (because I'm sad)
>      void zero() {
>        VK= Integer;
>        I = 0;
>      }
>   
>     [[clang::kaboom("FloatSet"]]
>     operator float() {
>       return F;
>     }
>   
>     [[clang::kaboom("IntSet")]]
>     operator int() {
>       return I;
>     }
>   };
>
> This does get into some more complex questions of whether `woot` would change or append status, and I can see arguments for both so we might need `appending_woot` too...

The more I look at this, the more closely it seems to resemble attributes and an analysis pass we already have: capability attributes: https://clang.llvm.org/docs/ThreadSafetyAnalysis.html#basic-concepts-capabilities We implemented those for thread safety analysis initially, but realized there's a more general principle hiding under there in terms of resources and capabilities for using those resources. `yolo` is acquiring an "uninitialized" capability, `woot` is releasing the "uninitialized" capability, and `kaboom` is requiring there not be an "uninitialized" capability. I'm CC'ing in @aaronpuchert and @delesley to see if they also agree with this assessment (both Aaron and DeLesley have been instrumental in this work), but I'm curious if @beanz had seen these and considered them or not.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130055/new/

https://reviews.llvm.org/D130055