[cfe-dev] Unions in the Clang Static Analyzer

Aaron Ballman via cfe-dev cfe-dev at lists.llvm.org
Thu Mar 2 22:47:47 PST 2017


On Thu, Mar 2, 2017 at 8:16 PM, Alexey Sidorin <alexey.v.sidorin at ya.ru> wrote:
> Hi Aaron!
>
> Thank you for response. I'm curious if such cases like described:
>
>   union U { intptr_t x; void *p; }
>
>   void foo(union U u) {
>     intptr_t X = u.x; // reg_$0<u.x>

This could be undefined behavior if, prior to calling foo(), u.x was
not the active union member or if there was no active union member.

>     void *P = u.p; // &SymRegion{reg_$1<u.p>}

Assuming there was not already UB, this has undefined behavior because
the lifetime of u.p has not begun (despite some compilers supporting,
and even documenting, this undefined behavior by allowing the
expression to act as a duck-type cast, which is a perfectly acceptable
form of undefined behavior).

>     if (X == P) { ... }
>   }
>
> have defined behavior or not. I have seen many discussions on this but I
> haven't seen any strict rule on this.

The strict rules on it are in [class.union] and [basic.life]p1 (to get started).

~Aaron

>
>
> 03.03.2017 08:49, Aaron Ballman пишет:
>
>> On Thu, Mar 2, 2017 at 1:11 PM, Alexey Sidorin <alexey.v.sidorin at ya.ru>
>> wrote:
>>>
>>> Hi all,
>>>
>>> There is one thing that can make the implementation a bit easier :)
>>> The reading of a union field that was not stored last may cause undefined
>>> behavior. A discussion on this issue can be found on
>>>
>>> https://www.securecoding.cert.org/confluence/display/c/EXP39-C.+Do+not+access+a+variable+through+a+pointer+of+an+incompatible+type
>>>
>>> I guess we need some language lawyers here. Aaron, can I disturb you?
>>
>> Certainly -- is there advice in particular you're looking for? I'm not
>> overly familiar with the static analyzer design, but I'm to help as I
>> can.
>>
>> A note below...
>>
>>>
>>> 02.03.2017 20:12, Artem Dergachev via cfe-dev пишет:
>>>
>>> Hello,
>>>
>>> I'm not aware of any unfinished work on union support. I'd gladly give a
>>> few
>>> hints.
>>>
>>> RegionStore itself already operates in a union-friendly manner: even
>>> though
>>> it presents itself as a MemRegion->SVal map, internally it only stores
>>> base
>>> regions and byte offsets as keys, regardless of how the bytes stored are
>>> interpreted.
>>>
>>> However, i suspect bigger problems with actually representing symbolic
>>> values for fields that are interpreted in multiple ways during the
>>> analysis.
>>> First of all, reinterpret-casts of basic types are in general modeled
>>> poorly. For example, cast from pointer to integer is represented as a
>>> special LocAsInteger SVal sub-class, support for which in SValBuilder is
>>> worse than for symbols; at the same time cast from integer to pointer is
>>> not
>>> represented at all, as i mentioned recently in
>>> http://lists.llvm.org/pipermail/cfe-dev/2017-February/052769.html .
>>>
>>> Then, unions are even more fun because they challenge the concept of
>>> SymbolRegionValue, which is a very basic thing. Consider an example:
>>>
>>>    union U { intptr_t x; void *p; }
>>>
>>>    void foo(union U u) {
>>>      intptr_t X = u.x; // reg_$0<u.x>
>>>      void *P = u.p; // &SymRegion{reg_$1<u.p>}
>>>      if (X == P) { ... }
>>>    }
>>>
>>> The symbolic values for X and P look completely different, the worst
>>> thing
>>> here probably being that X is NonLoc while P is Loc. Yet, they represent
>>> the
>>> same thing.
>>
>> They do not represent the same thing -- they designate the same region
>> of storage, but the underlying objects are not "the same" because they
>> have independent lifetimes. A union member within its lifetime is
>> "active", and the static analyzer would need to track which member of
>> the union is active to be able to reason about the program
>> appropriately.
>>
>> ~Aaron
>>
>>> How would we unify these values? If you come up with a single
>>> symbol to represent them, what would be the type of that symbol? Or would
>>> you prefer to constrain these symbols to be equal, and rely on the
>>> constraint solver to figure things out?
>>>
>>> So making a proper union support is going to be a fun research project.
>>> Smaller incremental improvements that fix particular cases you encounter
>>> are
>>> also very much welcome. There's not necessarily much thought behind the
>>> safety-first bail-outs, so if you are seeing significant improvements by
>>> removing them, we could consider lifting them; it may also turn out that
>>> they're necessary, and an experienced eye may sometimes understand why,
>>> but
>>> as long as there is no comprehensive explanation in comments and no
>>> testcase
>>> coverage, i'd rather encourage you to remove them and see what happens;
>>> especially considering that you've got access to a good codebase to test
>>> these changes upon.
>>>
>>>
>>> 02/03/2017 1:53 AM, Keno Fischer via cfe-dev wrote:
>>>
>>> I was having some trouble with using the static analyzer on code that
>>> involves unions. Looking through the code, it seems like this is a known
>>> problem and unions aren't particularly well supported. Personally, I was
>>> able to make some progress by just ripping out all the checks for union
>>> types, particularly in RegionStore. However, that's obviously a hacky
>>> solution, so to ensure that my check will keep working, I'd like to
>>> improve
>>> upstream support for unions if possible. Has anybody thought about this
>>> problem before/is there already a design for this? Alternatively, have
>>> people collected test cases that would benefit from improving union
>>> support?
>>> I'm sure I'm not the first person to hit this issue, so I'd appreciate
>>> any
>>> pointers to prior work.
>>>
>>>
>>> _______________________________________________ cfe-dev mailing list
>>> cfe-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> cfe-dev mailing list
>>> cfe-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>
>>>
>



More information about the cfe-dev mailing list