[PATCH] D140696: [clang][dataflow] Treat unions as structs.

Yitzhak Mandelbaum via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Dec 29 12:13:12 PST 2022


ymandel added a comment.

In D140696#4019589 <https://reviews.llvm.org/D140696#4019589>, @merrymeerkat wrote:

>> another thought: please verify that `createStorageLoction` and `createValue` can correctly handle union types. Otherwise, you'll likely end up with other (surprising) crashes in the system. E.g. 
>> https://github.com/llvm/llvm-project/blob/781eabeb40b8e47e3a46b0b927784e63f0aad9ab/clang/lib/Analysis/FlowSensitive/Transfer.cpp#L457
>> assumes that the base of a member expression is an `AggregateStorageLocation`, which means that in the case that `this` is such a base and it's a union type, you need to be sure that `createStorageLocation` has allocated an `AggregateStorageLocation`.
>>
>> Unfortunately, we don't have written down anywhere (aside from the code itself) the invariants we require.
>
> Good point!
>
> Based on the assertions in TransferTest.UnionThisMember, we are creating storage locations for unions. So, the code you linked to should not crash (at least on unions as simple as that in the test).
>
> We weren't creating values yet though. Turns out that was because `createValue` calls `createValueUnlessSelfReferential`, which creates values only for certain types, returning a nullptr otherwise: https://github.com/llvm/llvm-project/blob/8eb3698b940c4064b772f3ff5848d45f28523753/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp#L618-L699 . I've just changed this function's if-statement on structs and classes to accept union types too, and now we are creating values for unions. So, values should hopefully not be a problem either.
>
> There was already another unit test relating to unions (TransferTest.AssignToUnionMember), but it was only partly implemented. Now that we initialize locations and values for union types, I implemented the rest of the test. (thank you @gribozavr2 for the help debugging!)

Sorry to go back-and-forth here, but I'm afraid this might be overly ambitious. With this change, unions are modeled exactly as structs (as far as I can tell), which is unsound.  I think we need to think carefully before doing this (it might be fine, but it's a non trivial step) and we'd need to test thoroughly.  Moreover, this goes well beyond simply fixing the bug you set out to fix.

I realize that I'm the one who raised the concern with these functions, so I should explain: I had in mind just that you verify they work correctly with the fix (while retaining existing behavior of not modeling unions), rather than go the extra step of allowing unions altogether. I think that it would be enough to modify `createStorageLocation` as you have (so that unions are guaranteed to map from an `AggregateStorageLocation`), but then leave `createValue` as is. The result will be that all union-typed expressions will map to `AggregateStorageLocation`, so member expressions will work correctly, but the fields will always map to nullptr, so the modeling will be sound.

However, this all just begs the question of why go down this path to begin with. If we can't properly model union fields, does it make sense to model a union-typed `this` pointer?  Going back to your original argument, I think I missed a crucial point (and apologies that I didn't catch this sooner!):

> I thought more about this and I think I'd actually prefer to add support for unions and skip the null check. The support could be useful for users. Now, as for why I think we should skip the null check: in l. 226 of Analysis/FlowSensitive/DataflowEnvironment.cpp, we check for methods that are not static. These are the prerequisites for something having a "this" pointer*. So, since we initialise the ThisPointeeLoc there, any intializer we're processing in TypeErasedDataflowAnalysis::builtinTransferInitializer should already have a not-null "this" pointer, making the check unnecessary. What do you think?

You're making the case here for why we should avoid the null check, given that we support unions.  And you're logic is sound! But, it jumps over the hard question: when `this` is a union, do we want to initialise it all? Or, like other expressions we can't model, should we leave this `null` and then update the downstream code to avoid assuming that that `this` is non-null? (Or, a twist on that: should we avoid analysing the method at all when the object type is a union, since we're severely limited in what we can do. That would give us both benefits: we can still assume that the `ThisPointeeLoc` is never null, and avoid the bug.)  I think we need to dig into this question first, and then the answer about the fix follows. I would assert that to fix the bug you're encountering, you're fastest solution is to add the null check. Then, if you want to come back later and extend the framework to support unions, we can do that (and correspondingly remove the null check), but its a longer and more difficult process.

Again, sorry for the confusion...


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140696/new/

https://reviews.llvm.org/D140696



More information about the cfe-commits mailing list