[cfe-dev] [analyzer] Calling SymbolReaper::markInUse on BinarySymExprs are ignored

Balázs Benics via cfe-dev cfe-dev at lists.llvm.org
Thu Aug 6 13:38:58 PDT 2020


Gábor
> How much work would it be to prototype keeping these expressions alive
and measuring performance and memory implications?
I'm not sure, since I'm not really experienced in this liveness stuff. I
will try it.

Artem
> It sounds to me as if putting metadata symbols into the live set directly
would have worked just fine.
Ehm, I' not sure about this.
If you look my example closely, you can note that the CStringChecker maps
directly the SymRegion{reg_$0<char * src>} to the meta_$2{SymRegion{reg_$0<char
* src>},unsigned long} symbol.
So the SymbolReaper::markInUse will in fact place that meta_$2 symbol in
the Live set.
However later, when you query the SymbolReaper if the mentioned meta_$2
symbol is dead or not, it will answer you that it is *dead*.

I'm quoting the related trace log:

> [...]
> *CStringChecker::**checkLiveSymbols marks 'meta_$2{SymRegion{reg_$0<char
> * src>},unsigned long}' in use*
>
> # Even we marked the given symbols in use, we still removes them for some
> reason...
> CStringChecker::checkDeadSymbols finds the '(meta_$2{SymRegion{reg_$0<char
> * src>},unsigned long}) + 4U' as dead; so removes the mapping from 'dst'
> *CStringChecker::**checkDeadSymbols finds the
> 'meta_$2{SymRegion{reg_$0<char * src>},unsigned long}' as dead; so removes
> the mapping from 'SymRegion{reg_$0<char * src>}'*
>

Why does the ExprEngine conjure a return symbol, if an evalCall already
evaluated the call?

Artem
> See how RegionStore does that within `ScanReachableSymbols::scan(const
SymExpr *)`
I managed to implement a metadata collector visitor using the new
SymExprVisitor infrastructure, in just a couple of lines of code. I was
amazed by that :)

Artem Dergachev <noqnoqneo at gmail.com> ezt írta (időpont: 2020. aug. 6., Cs,
19:04):

> I- i- i was about to reply to that!
>
> I don't know why metadata-in-use is a thing at all. It sounds to me as if
> putting metadata symbols into the live set directly would have worked just
> fine. If you find any interesting counterexamples please let me know.
>
> Apart from that, indeed, the correct way to implement checkLiveSymbols
> when you're tracking arbitrary symbols is to iterate over these arbitrary
> symbols and mark all sub-symbols as live. See how RegionStore does that
> within `ScanReachableSymbols::scan(const SymExpr *)`. I.e., the following
> example works correctly and i expect CStringChecker to work similarly:
>
> ```
> int conjure();
>
> int foo() {
>   int x = conjure();
>   clang_analyzer_warnOnDeadSymbol(x);
>   return x + 1;
> }
>
> void bar() {
>   int y = foo(); // At this point `conj_$2` is no longer directly present
> in the state; only `conj_$2 + 1` is.
>   (void)y;
> } // But despite that, `conj_$2` only dies here.
> ```
>
>
> On 8/6/20 11:35 AM, Gábor Horváth via cfe-dev wrote:
>
> +Artem
>
> It would be great if the analyzer could reason about code like that. I
> think Artem is the most competent in these liveness related problems.
> Aside from performance, I do not see any downside for keeping the whole
> symbolic expression alive after markInUse was called on it (hopefully Artem
> corrects me if I'm wrong).
> But mainly due to constraint solver limitations it might not make sense to
> keep arbitrarily complex expressions alive.
>
> @Balázs Benics <benicsbalazs at gmail.com>
> How much work would it be to prototype keeping these expressions alive and
> measuring performance and memory implications?
>
>
> On Wed, 5 Aug 2020 at 16:17, Balázs Benics via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
>> If you are really curious, here is some context.
>>
>> Imagine the following code (test.c):
>> ```
>> typedef typeof(sizeof(int)) size_t;
>>
>> void clang_analyzer_eval(int);
>> char  *strcat(char *restrict s1, const char *restrict s2);
>> size_t strlen(const char *s);
>>
>> void strcat_symbolic_src_length(char *src) {
>>   char dst[8] = "1234";
>>   strcat(dst, src);
>>   clang_analyzer_eval(strlen(dst) >= 4); // expected-warning{{TRUE}}
>>   (void)*dst;
>> }
>> ```
>>
>> One would expect that the 'strlen(dst) >= 4' is TRUE, but it returns
>> UNKOWN.
>>
>> After doing a little bit of investigation - and debug prints - I came up
>> with the following trace.
>>
>> ---
>>
>> # In the CStringChecker::evalStrcat creates a metadata symbol
>> representing the cstring length of the region pointed by 'src'.
>> created metadata symbol 'meta_$2{SymRegion{reg_$0<char * src>},unsigned
>> long}'
>>
>> # After the evalStrcat evaluated the call, the state contains the
>> necessary mapping that the 'dst' points to a cstring which is 4 + meta$4
>> long.
>> # Note that meta$4 represents the cstring length of the region pointed by
>> 'src'.
>> # So far so good. We know that resulting cstring length precisely.
>> strcpy common END State.dump: "program_state": {
>>   [...]
>>   "checker_messages": [
>>     { "checker": "unix.cstring.CStringModeling", "messages": [
>>       "CString lengths:",
>>       "dst: (meta_$2{SymRegion{reg_$0<char * src>},unsigned long}) + 4U",
>>       "SymRegion{reg_$0<char * src>}: meta_$2{SymRegion{reg_$0<char *
>> src>},unsigned long}",
>>       ""
>>     ]}
>>   ]
>> }
>>
>> # We mark all symbols as live in the cstring length map.
>> # At least we think so...
>> CStringChecker::checkLiveSymbols marks '(meta_$2{SymRegion{reg_$0<char *
>> src>},unsigned long}) + 4U' in use
>> CStringChecker::checkLiveSymbols marks 'meta_$2{SymRegion{reg_$0<char *
>> src>},unsigned long}' in use
>> CStringChecker::checkLiveSymbols marks 'meta_$2{SymRegion{reg_$0<char *
>> src>},unsigned long}' in use
>>
>> # Even we marked the given symbols in use, we still removes them for some
>> reason...
>> CStringChecker::checkDeadSymbols finds the
>> '(meta_$2{SymRegion{reg_$0<char * src>},unsigned long}) + 4U' as dead; so
>> removes the mapping from 'dst'
>> CStringChecker::checkDeadSymbols finds the 'meta_$2{SymRegion{reg_$0<char
>> * src>},unsigned long}' as dead; so removes the mapping from
>> 'SymRegion{reg_$0<char * src>}'
>>
>> # Now that state does not contain the cstring length of the region
>> pointed by 'dst'.
>>
>> ---
>>
>> Further investigation showed that even if I would visit all the sub
>> SymExprs looking for SymbolMetadata would not help.
>> SymbolReaper::isDead would still show that the
>> '(meta_$2{SymRegion{reg_$0<char * src>},unsigned long}) + 4U' SymExpr is
>> dead.
>>
>> ---
>>
>> How can we preserve such metadata information?
>>
>> ---
>>
>> You can also reproduce this following these steps:
>>
>> Apply the add-debug-prints.patch on top of 7f1556f292ccfd80c4ffa986d5b849f915e5cd82
>> "Fix typo: s/epomymous/eponymous/ NFC".
>> Analyze the 'test.c' file using this command:
>> ./bin/clang -cc1 -internal-isystem lib/clang/12.0.0/include
>> -nostdsysteminc -analyze -analyzer-constraints=range -setup-static-analyzer
>> -analyzer-checker=unix.cstring,alpha.unix.cstring,alpha.security.taint,debug.ExprInspection
>> -analyzer-config eagerly-assume=false test.c
>>
>> Balázs Benics <benicsbalazs at gmail.com> ezt írta (időpont: 2020. aug. 5.,
>> Sze, 10:29):
>>
>>> Why does the SymbolReaper::markInUse only work on SymbolMetadata symbols?
>>>
>>> void SymbolReaper::markInUse(SymbolRef sym) {
>>>   if (isa<SymbolMetadata>(sym))
>>>     MetadataInUse.insert(sym);}
>>>
>>> I think it is flawed if the Symbol is a SymIntExpr holding an expression
>>> tree referring to SymbolMetadata symbols. In such case, those symbols would
>>> not be kept alive - causing some confusion on the checker developers' side
>>> and potentially losing some information about the analysis.
>>>
>>> Should we walk the expression tree instead of the current implementation?
>>> What performance impact should we expect by doing so?
>>>
>>> Any ideas?
>>>
>>> Balazs.
>>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>
> _______________________________________________
> cfe-dev mailing listcfe-dev at lists.llvm.orghttps://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200806/6a29fa60/attachment.html>


More information about the cfe-dev mailing list