[cfe-dev] Unions in the Clang Static Analyzer
Artem Dergachev via cfe-dev
cfe-dev at lists.llvm.org
Mon Aug 20 16:37:36 PDT 2018
Yup, we could use tracking a-la taint analysis in checkers for union
based undefined values, or in all checkers for undefined values.
I don't think this is blocking union support, even in the form of
technical debt. We may have union support without emitting
union-specific warnings at all. Modeling defined-but-weird values
consistently is the difficult part here.
On 8/20/18 3:18 PM, Gábor Horváth wrote:
> Sorry for the slightly off topic, but I it always bothered me that we
> use UndefinedVal to represent uninitialized values (and any kind of
> undefined value regardless of the reason it is being undefined).
> If somebody volunteered to clean this up, we could have the
> union-specific undefined values as an opt-in thing.
> While this would probably be a lot of work, it could be a 0th step
> towards proper union support (though arguably this might not be the
> best return on invest).
>
> On Mon, 20 Aug 2018 at 15:08, Artem Dergachev via cfe-dev
> <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
>
> Ideally we should implement things according to the standard,
> producing an UndefinedVal() every time the value is undefined.
> This will also force us to teach our uninitialized value checkers
> to understand why the value is uninitialized.
>
> However this approach would prevent us from making union-specific
> undefined values an opt-in thing. And i believe that we might
> eventually want that because not all developers are worried about
> undefined behavior; there might be codebases in which it is normal
> to believe that on a specific platform type-punning is always
> going to happen, even in C++.
>
> So i guess we'll have to stick to believing in type punning as
> much as possible and never produce UndefinedVal()s in ExprEngine
> but let union-specific checkers decide if we want to emit warnings
> on any of those.
>
> ---
>
> When it comes to representing symbolic values loaded from union
> fields (the problem explained as
> http://lists.llvm.org/pipermail/cfe-dev/2017-March/052881.html) i
> don't know how exactly do i want to represent them, but i would
> imagine producing a new sort of symbol (let's call it
> "SymbolUnionValue") that has the following as its identity:
>
> - MemRegion of the union.
> - Unsigned integer type of the same size as the read.
>
> Then the read can be represented as the result of a cast from the
> value to the actual read type. This way we'd discriminate between
> reads of different size but we'll know that reads of the same size
> would produce the same value up to casts. This reduces the problem
> to the problem of representing casts (which will likely be your
> second problem). Derived symbols would need to be handled
> similarly (dunno how to call the corresponding derived union
> symbols). It is safe to assume that every read can be represented
> as an integer read of a certain type because otherwise we can
> always return a LazyCompoundValue that doesn't have any identity
> problems whatsoever (because it doesn't promise anything) or an
> UnknownVal for types that aren't supported anyway (like floats).
> If you enjoy the idea of using our experimental Z3 refutation, it
> shouldn't be hard to teach a clever solver how exactly reads of
> different sizes are related to each other, but our usual
> RangeConstraintManager wouldn't be able to reason about this stuff
> anyway, you shouldn't bother.
>
> Supporting loads/stores of concrete values should be
> straightforward but may still need some work.
>
> On 8/19/18 5:52 PM, Keno Fischer wrote:
>> Sorry for bumping an old thread, but I ended up trying to pick up
>> the project that brought this up again after
>> giving up last year because of the analyzer's lacking union
>> support. Naturally I ran into the same problems again :/.
>> It seems like the first step here is to decide what model of
>> unions the analyzer intends to support. I'd be
>> happy to try fix some things here, but of course those decisions
>> need to be made first.
>>
>> On Fri, Mar 3, 2017 at 2:12 AM, Aaron Ballman
>> <aaron at aaronballman.com <mailto:aaron at aaronballman.com>> wrote:
>>
>> On Thu, Mar 2, 2017 at 8:54 PM, Keno Fischer
>> <keno at juliacomputing.com <mailto:keno at juliacomputing.com>> wrote:
>> > This behavior also differs between C and C++. For example,
>> C has explicitly:
>> >
>> > [N1570, Footnote 95]
>> > If the member used to read the contents of a union object
>> is not the
>> > same as the member last used to store a value in the
>> object, the
>> > appropriate part of the object representation of the value is
>> > reinterpreted as an object representation in the new type
>> as described
>> > in 6.2.6 (a process sometimes called ‘‘type punning’’).
>> This might be
>> > a trap representation.
>> >
>> > where object representation is essentially what you'd get
>> if you
>> > memcpyed the value.
>>
>> Yes, if everything lines up just perfectly, then in C this may be
>> okay. I say may because it can still be a trap representation, it
>> could be a misaligned access, there could be unspecified
>> values within
>> the union member, etc (so it depends on the types and the
>> specific
>> values), so it is *not* strictly a safe operation to do in C
>> either.
>>
>> ~Aaron
>>
>>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20180820/596e63f5/attachment.html>
More information about the cfe-dev
mailing list