[cfe-dev] Unions in the Clang Static Analyzer

Artem Dergachev via cfe-dev cfe-dev at lists.llvm.org
Mon Aug 20 15:08:07 PDT 2018


Ideally we should implement things according to the standard, producing 
an UndefinedVal() every time the value is undefined. This will also 
force us to teach our uninitialized value checkers to understand why the 
value is uninitialized.

However this approach would prevent us from making union-specific 
undefined values an opt-in thing. And i believe that we might eventually 
want that because not all developers are worried about undefined 
behavior; there might be codebases in which it is normal to believe that 
on a specific platform type-punning is always going to happen, even in C++.

So i guess we'll have to stick to believing in type punning as much as 
possible and never produce UndefinedVal()s in ExprEngine but let 
union-specific checkers decide if we want to emit warnings on any of those.

---

When it comes to representing symbolic values loaded from union fields 
(the problem explained as 
http://lists.llvm.org/pipermail/cfe-dev/2017-March/052881.html) i don't 
know how exactly do i want to represent them, but i would imagine 
producing a new sort of symbol (let's call it "SymbolUnionValue") that 
has the following as its identity:

   - MemRegion of the union.
   - Unsigned integer type of the same size as the read.

Then the read can be represented as the result of a cast from the value 
to the actual read type. This way we'd discriminate between reads of 
different size but we'll know that reads of the same size would produce 
the same value up to casts. This reduces the problem to the problem of 
representing casts (which will likely be your second problem). Derived 
symbols would need to be handled similarly (dunno how to call the 
corresponding derived union symbols). It is safe to assume that every 
read can be represented as an integer read of a certain type because 
otherwise we can always return a LazyCompoundValue that doesn't have any 
identity problems whatsoever (because it doesn't promise anything) or an 
UnknownVal for types that aren't supported anyway (like floats). If you 
enjoy the idea of using our experimental Z3 refutation, it shouldn't be 
hard to teach a clever solver how exactly reads of different sizes are 
related to each other, but our usual RangeConstraintManager wouldn't be 
able to reason about this stuff anyway, you shouldn't bother.

Supporting loads/stores of concrete values should be straightforward but 
may still need some work.

On 8/19/18 5:52 PM, Keno Fischer wrote:
> Sorry for bumping an old thread, but I ended up trying to pick up the 
> project that brought this up again after
> giving up last year because of the analyzer's lacking union support. 
> Naturally I ran into the same problems again :/.
> It seems like the first step here is to decide what model of unions 
> the analyzer intends to support. I'd be
> happy to try fix some things here, but of course those decisions need 
> to be made first.
>
> On Fri, Mar 3, 2017 at 2:12 AM, Aaron Ballman <aaron at aaronballman.com 
> <mailto:aaron at aaronballman.com>> wrote:
>
>     On Thu, Mar 2, 2017 at 8:54 PM, Keno Fischer
>     <keno at juliacomputing.com <mailto:keno at juliacomputing.com>> wrote:
>     > This behavior also differs between C and C++. For example, C has
>     explicitly:
>     >
>     > [N1570, Footnote 95]
>     > If the member used to read the contents of a union object is not the
>     > same as the member last used to store a value in the object, the
>     > appropriate part of the object representation of the value is
>     > reinterpreted as an object representation in the new type as
>     described
>     > in 6.2.6 (a process sometimes called ‘‘type punning’’). This
>     might be
>     > a trap representation.
>     >
>     > where object representation is essentially what you'd get if you
>     > memcpyed the value.
>
>     Yes, if everything lines up just perfectly, then in C this may be
>     okay. I say may because it can still be a trap representation, it
>     could be a misaligned access, there could be unspecified values within
>     the union member, etc (so it depends on the types and the specific
>     values), so it is *not* strictly a safe operation to do in C either.
>
>     ~Aaron
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20180820/c361246c/attachment.html>


More information about the cfe-dev mailing list