[cfe-dev] Unions in the Clang Static Analyzer

Gábor Horváth via cfe-dev cfe-dev at lists.llvm.org
Mon Aug 20 15:18:16 PDT 2018


Sorry for the slightly off topic, but I it always bothered me that we use
UndefinedVal to represent uninitialized values (and any kind of undefined
value regardless of the reason it is being undefined).
If somebody volunteered to clean this up, we could have the union-specific
undefined values as an opt-in thing.
While this would probably be a lot of work, it could be a 0th step towards
proper union support (though arguably this might not be the best return on
invest).

On Mon, 20 Aug 2018 at 15:08, Artem Dergachev via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> Ideally we should implement things according to the standard, producing an
> UndefinedVal() every time the value is undefined. This will also force us
> to teach our uninitialized value checkers to understand why the value is
> uninitialized.
>
> However this approach would prevent us from making union-specific
> undefined values an opt-in thing. And i believe that we might eventually
> want that because not all developers are worried about undefined behavior;
> there might be codebases in which it is normal to believe that on a
> specific platform type-punning is always going to happen, even in C++.
>
> So i guess we'll have to stick to believing in type punning as much as
> possible and never produce UndefinedVal()s in ExprEngine but let
> union-specific checkers decide if we want to emit warnings on any of those.
>
> ---
>
> When it comes to representing symbolic values loaded from union fields
> (the problem explained as
> http://lists.llvm.org/pipermail/cfe-dev/2017-March/052881.html) i don't
> know how exactly do i want to represent them, but i would imagine producing
> a new sort of symbol (let's call it "SymbolUnionValue") that has the
> following as its identity:
>
>   - MemRegion of the union.
>   - Unsigned integer type of the same size as the read.
>
> Then the read can be represented as the result of a cast from the value to
> the actual read type. This way we'd discriminate between reads of different
> size but we'll know that reads of the same size would produce the same
> value up to casts. This reduces the problem to the problem of representing
> casts (which will likely be your second problem). Derived symbols would
> need to be handled similarly (dunno how to call the corresponding derived
> union symbols). It is safe to assume that every read can be represented as
> an integer read of a certain type because otherwise we can always return a
> LazyCompoundValue that doesn't have any identity problems whatsoever
> (because it doesn't promise anything) or an UnknownVal for types that
> aren't supported anyway (like floats). If you enjoy the idea of using our
> experimental Z3 refutation, it shouldn't be hard to teach a clever solver
> how exactly reads of different sizes are related to each other, but our
> usual RangeConstraintManager wouldn't be able to reason about this stuff
> anyway, you shouldn't bother.
>
> Supporting loads/stores of concrete values should be straightforward but
> may still need some work.
>
> On 8/19/18 5:52 PM, Keno Fischer wrote:
>
> Sorry for bumping an old thread, but I ended up trying to pick up the
> project that brought this up again after
> giving up last year because of the analyzer's lacking union support.
> Naturally I ran into the same problems again :/.
> It seems like the first step here is to decide what model of unions the
> analyzer intends to support. I'd be
> happy to try fix some things here, but of course those decisions need to
> be made first.
>
> On Fri, Mar 3, 2017 at 2:12 AM, Aaron Ballman <aaron at aaronballman.com>
> wrote:
>
>> On Thu, Mar 2, 2017 at 8:54 PM, Keno Fischer <keno at juliacomputing.com>
>> wrote:
>> > This behavior also differs between C and C++. For example, C has
>> explicitly:
>> >
>> > [N1570, Footnote 95]
>> > If the member used to read the contents of a union object is not the
>> > same as the member last used to store a value in the object, the
>> > appropriate part of the object representation of the value is
>> > reinterpreted as an object representation in the new type as described
>> > in 6.2.6 (a process sometimes called ‘‘type punning’’). This might be
>> > a trap representation.
>> >
>> > where object representation is essentially what you'd get if you
>> > memcpyed the value.
>>
>> Yes, if everything lines up just perfectly, then in C this may be
>> okay. I say may because it can still be a trap representation, it
>> could be a misaligned access, there could be unspecified values within
>> the union member, etc (so it depends on the types and the specific
>> values), so it is *not* strictly a safe operation to do in C either.
>>
>> ~Aaron
>>
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20180820/7c0db36f/attachment.html>


More information about the cfe-dev mailing list