[cfe-dev] [analyzer] Calling SymbolReaper::markInUse on BinarySymExprs are ignored
Artem Dergachev via cfe-dev
cfe-dev at lists.llvm.org
Thu Aug 6 17:05:15 PDT 2020
On 8/6/20 1:38 PM, Balázs Benics wrote:
> Gábor
> > How much work would it be to prototype keeping these expressions
> alive and measuring performance and memory implications?
> I'm not sure, since I'm not really experienced in this liveness stuff.
> I will try it.
>
> Artem
> > It sounds to me as if putting metadata symbols into the live set
> directly would have worked just fine.
> Ehm, I' not sure about this.
> If you look my example closely, you can note that the CStringChecker
> maps directly the SymRegion{reg_$0<char * src>} to the
> meta_$2{SymRegion{reg_$0<char * src>},unsigned long} symbol.
> So the SymbolReaper::markInUse will in fact place that meta_$2 symbol
> in the Live set.
> However later, when you query the SymbolReaper if the mentioned
> meta_$2 symbol is dead or not, it will answer you that it is *dead*.
That's not how that works.
markInUse() doesn't put anything into the live set, it puts things into
an auxiliary "metadata-in-use" set:
https://github.com/llvm/llvm-project/blob/llvmorg-11.0.0-rc1/clang/lib/StaticAnalyzer/Core/SymbolManager.cpp#L396
On the other hand, isDead()/isLive() has to return true if the symbol is
present in the live set. In fact, that's the first thing it checks for:
https://github.com/llvm/llvm-project/blob/llvmorg-11.0.0-rc1/clang/lib/StaticAnalyzer/Core/SymbolManager.cpp#L437
>
> I'm quoting the related trace log:
>
> [...]
> *CStringChecker::**checkLiveSymbols marks
> 'meta_$2{SymRegion{reg_$0<char * src>},unsigned long}' in use*
>
> # Even we marked the given symbols in use, we still removes them
> for some reason...
> CStringChecker::checkDeadSymbols finds the
> '(meta_$2{SymRegion{reg_$0<char * src>},unsigned long}) + 4U' as
> dead; so removes the mapping from 'dst'
> *CStringChecker::**checkDeadSymbols finds the
> 'meta_$2{SymRegion{reg_$0<char * src>},unsigned long}' as dead; so
> removes the mapping from 'SymRegion{reg_$0<char * src>}'*
>
>
> Why does the ExprEngine conjure a return symbol, if an evalCall
> already evaluated the call?
>
> Artem
> > See how RegionStore does that within
> `ScanReachableSymbols::scan(const SymExpr *)`
> I managed to implement a metadata collector visitor using the new
> SymExprVisitor infrastructure, in just a couple of lines of code. I
> was amazed by that :)
>
> Artem Dergachev <noqnoqneo at gmail.com <mailto:noqnoqneo at gmail.com>> ezt
> írta (időpont: 2020. aug. 6., Cs, 19:04):
>
> I- i- i was about to reply to that!
>
> I don't know why metadata-in-use is a thing at all. It sounds to
> me as if putting metadata symbols into the live set directly would
> have worked just fine. If you find any interesting counterexamples
> please let me know.
>
> Apart from that, indeed, the correct way to implement
> checkLiveSymbols when you're tracking arbitrary symbols is to
> iterate over these arbitrary symbols and mark all sub-symbols as
> live. See how RegionStore does that within
> `ScanReachableSymbols::scan(const SymExpr *)`. I.e., the following
> example works correctly and i expect CStringChecker to work similarly:
>
> ```
> int conjure();
>
> int foo() {
> int x = conjure();
> clang_analyzer_warnOnDeadSymbol(x);
> return x + 1;
> }
>
> void bar() {
> int y = foo(); // At this point `conj_$2` is no longer directly
> present in the state; only `conj_$2 + 1` is.
> (void)y;
> } // But despite that, `conj_$2` only dies here.
> ```
>
>
> On 8/6/20 11:35 AM, Gábor Horváth via cfe-dev wrote:
>> +Artem
>>
>> It would be great if the analyzer could reason about code like
>> that. I think Artem is the most competent in these liveness
>> related problems.
>> Aside from performance, I do not see any downside for keeping the
>> whole symbolic expression alive after markInUse was called on it
>> (hopefully Artem corrects me if I'm wrong).
>> But mainly due to constraint solver limitations it might not make
>> sense to keep arbitrarily complex expressions alive.
>>
>> @Balázs Benics <mailto:benicsbalazs at gmail.com>
>> How much work would it be to prototype keeping these expressions
>> alive and measuring performance and memory implications?
>>
>>
>> On Wed, 5 Aug 2020 at 16:17, Balázs Benics via cfe-dev
>> <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
>>
>> If you are really curious, here is some context.
>>
>> Imagine the following code (test.c):
>> ```
>> typedef typeof(sizeof(int)) size_t;
>>
>> void clang_analyzer_eval(int);
>> char *strcat(char *restrict s1, const char *restrict s2);
>> size_t strlen(const char *s);
>>
>> void strcat_symbolic_src_length(char *src) {
>> char dst[8] = "1234";
>> strcat(dst, src);
>> clang_analyzer_eval(strlen(dst) >= 4); //
>> expected-warning{{TRUE}}
>> (void)*dst;
>> }
>> ```
>>
>> One would expect that the 'strlen(dst) >= 4' is TRUE, but it
>> returns UNKOWN.
>>
>> After doing a little bit of investigation - and debug prints
>> - I came up with the following trace.
>>
>> ---
>>
>> # In the CStringChecker::evalStrcat creates a metadata symbol
>> representing the cstring length of the region pointed by 'src'.
>> created metadata symbol 'meta_$2{SymRegion{reg_$0<char *
>> src>},unsigned long}'
>>
>> # After the evalStrcat evaluated the call, the state contains
>> the necessary mapping that the 'dst' points to a cstring
>> which is 4 + meta$4 long.
>> # Note that meta$4 represents the cstring length of the
>> region pointed by 'src'.
>> # So far so good. We know that resulting cstring length
>> precisely.
>> strcpy common END State.dump: "program_state": {
>> [...]
>> "checker_messages": [
>> { "checker": "unix.cstring.CStringModeling", "messages": [
>> "CString lengths:",
>> "dst: (meta_$2{SymRegion{reg_$0<char * src>},unsigned
>> long}) + 4U",
>> "SymRegion{reg_$0<char * src>}:
>> meta_$2{SymRegion{reg_$0<char * src>},unsigned long}",
>> ""
>> ]}
>> ]
>> }
>>
>> # We mark all symbols as live in the cstring length map.
>> # At least we think so...
>> CStringChecker::checkLiveSymbols marks
>> '(meta_$2{SymRegion{reg_$0<char * src>},unsigned long}) + 4U'
>> in use
>> CStringChecker::checkLiveSymbols marks
>> 'meta_$2{SymRegion{reg_$0<char * src>},unsigned long}' in use
>> CStringChecker::checkLiveSymbols marks
>> 'meta_$2{SymRegion{reg_$0<char * src>},unsigned long}' in use
>>
>> # Even we marked the given symbols in use, we still removes
>> them for some reason...
>> CStringChecker::checkDeadSymbols finds the
>> '(meta_$2{SymRegion{reg_$0<char * src>},unsigned long}) + 4U'
>> as dead; so removes the mapping from 'dst'
>> CStringChecker::checkDeadSymbols finds the
>> 'meta_$2{SymRegion{reg_$0<char * src>},unsigned long}' as
>> dead; so removes the mapping from 'SymRegion{reg_$0<char * src>}'
>>
>> # Now that state does not contain the cstring length of the
>> region pointed by 'dst'.
>>
>> ---
>>
>> Further investigation showed that even if I would visit all
>> the sub SymExprs looking for SymbolMetadata would not help.
>> SymbolReaper::isDead would still show that the
>> '(meta_$2{SymRegion{reg_$0<char * src>},unsigned long}) + 4U'
>> SymExpr is dead.
>>
>> ---
>>
>> How can we preserve such metadata information?
>>
>> ---
>>
>> You can also reproduce this following these steps:
>>
>> Apply the add-debug-prints.patch on top of
>> 7f1556f292ccfd80c4ffa986d5b849f915e5cd82 "Fix typo:
>> s/epomymous/eponymous/ NFC".
>> Analyze the 'test.c' file using this command:
>> ./bin/clang -cc1 -internal-isystem lib/clang/12.0.0/include
>> -nostdsysteminc -analyze -analyzer-constraints=range
>> -setup-static-analyzer
>> -analyzer-checker=unix.cstring,alpha.unix.cstring,alpha.security.taint,debug.ExprInspection
>> -analyzer-config eagerly-assume=false test.c
>>
>> Balázs Benics <benicsbalazs at gmail.com
>> <mailto:benicsbalazs at gmail.com>> ezt írta (időpont: 2020.
>> aug. 5., Sze, 10:29):
>>
>> Why does the SymbolReaper::markInUse only work on
>> SymbolMetadata symbols?
>>
>> void SymbolReaper::markInUse(SymbolRef sym) {
>> if (isa<SymbolMetadata>(sym))
>> MetadataInUse.insert(sym); }
>>
>> I think it is flawed if the Symbol is a SymIntExpr
>> holding an expression tree referring to SymbolMetadata
>> symbols. In such case, those symbols would not be kept
>> alive - causing some confusion on the checker developers'
>> side and potentially losing some information about the
>> analysis.
>>
>> Should we walk the expression tree instead of the current
>> implementation?
>> What performance impact should we expect by doing so?
>>
>> Any ideas?
>>
>> Balazs.
>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>> <https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
>>
>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev <https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200806/cf7a9382/attachment-0001.html>
More information about the cfe-dev
mailing list