[cfe-dev] [analyzer] Calling SymbolReaper::markInUse on BinarySymExprs are ignored

Artem Dergachev via cfe-dev cfe-dev at lists.llvm.org
Thu Aug 6 17:05:15 PDT 2020



On 8/6/20 1:38 PM, Balázs Benics wrote:
> Gábor
> > How much work would it be to prototype keeping these expressions 
> alive and measuring performance and memory implications?
> I'm not sure, since I'm not really experienced in this liveness stuff. 
> I will try it.
>
> Artem
> > It sounds to me as if putting metadata symbols into the live set 
> directly would have worked just fine.
> Ehm, I' not sure about this.
> If you look my example closely, you can note that the CStringChecker 
> maps directly the SymRegion{reg_$0<char * src>} to the 
> meta_$2{SymRegion{reg_$0<char * src>},unsigned long} symbol.
> So the SymbolReaper::markInUse will in fact place that meta_$2 symbol 
> in the Live set.
> However later, when you query the SymbolReaper if the mentioned 
> meta_$2 symbol is dead or not, it will answer you that it is *dead*.

That's not how that works.

markInUse() doesn't put anything into the live set, it puts things into 
an auxiliary "metadata-in-use" set: 
https://github.com/llvm/llvm-project/blob/llvmorg-11.0.0-rc1/clang/lib/StaticAnalyzer/Core/SymbolManager.cpp#L396

On the other hand, isDead()/isLive() has to return true if the symbol is 
present in the live set. In fact, that's the first thing it checks for: 
https://github.com/llvm/llvm-project/blob/llvmorg-11.0.0-rc1/clang/lib/StaticAnalyzer/Core/SymbolManager.cpp#L437

>
> I'm quoting the related trace log:
>
>     [...]
>     *CStringChecker::**checkLiveSymbols marks
>     'meta_$2{SymRegion{reg_$0<char * src>},unsigned long}' in use*
>
>     # Even we marked the given symbols in use, we still removes them
>     for some reason...
>     CStringChecker::checkDeadSymbols finds the
>     '(meta_$2{SymRegion{reg_$0<char * src>},unsigned long}) + 4U' as
>     dead; so removes the mapping from 'dst'
>     *CStringChecker::**checkDeadSymbols finds the
>     'meta_$2{SymRegion{reg_$0<char * src>},unsigned long}' as dead; so
>     removes the mapping from 'SymRegion{reg_$0<char * src>}'*
>
>
> Why does the ExprEngine conjure a return symbol, if an evalCall 
> already evaluated the call?
>
> Artem
> > See how RegionStore does that within 
> `ScanReachableSymbols::scan(const SymExpr *)`
> I managed to implement a metadata collector visitor using the new 
> SymExprVisitor infrastructure, in just a couple of lines of code. I 
> was amazed by that :)
>
> Artem Dergachev <noqnoqneo at gmail.com <mailto:noqnoqneo at gmail.com>> ezt 
> írta (időpont: 2020. aug. 6., Cs, 19:04):
>
>     I- i- i was about to reply to that!
>
>     I don't know why metadata-in-use is a thing at all. It sounds to
>     me as if putting metadata symbols into the live set directly would
>     have worked just fine. If you find any interesting counterexamples
>     please let me know.
>
>     Apart from that, indeed, the correct way to implement
>     checkLiveSymbols when you're tracking arbitrary symbols is to
>     iterate over these arbitrary symbols and mark all sub-symbols as
>     live. See how RegionStore does that within
>     `ScanReachableSymbols::scan(const SymExpr *)`. I.e., the following
>     example works correctly and i expect CStringChecker to work similarly:
>
>     ```
>     int conjure();
>
>     int foo() {
>       int x = conjure();
>       clang_analyzer_warnOnDeadSymbol(x);
>       return x + 1;
>     }
>
>     void bar() {
>       int y = foo(); // At this point `conj_$2` is no longer directly
>     present in the state; only `conj_$2 + 1` is.
>       (void)y;
>     } // But despite that, `conj_$2` only dies here.
>     ```
>
>
>     On 8/6/20 11:35 AM, Gábor Horváth via cfe-dev wrote:
>>     +Artem
>>
>>     It would be great if the analyzer could reason about code like
>>     that. I think Artem is the most competent in these liveness
>>     related problems.
>>     Aside from performance, I do not see any downside for keeping the
>>     whole symbolic expression alive after markInUse was called on it
>>     (hopefully Artem corrects me if I'm wrong).
>>     But mainly due to constraint solver limitations it might not make
>>     sense to keep arbitrarily complex expressions alive.
>>
>>     @Balázs Benics <mailto:benicsbalazs at gmail.com>
>>     How much work would it be to prototype keeping these expressions
>>     alive and measuring performance and memory implications?
>>
>>
>>     On Wed, 5 Aug 2020 at 16:17, Balázs Benics via cfe-dev
>>     <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
>>
>>         If you are really curious, here is some context.
>>
>>         Imagine the following code (test.c):
>>         ```
>>         typedef typeof(sizeof(int)) size_t;
>>
>>         void clang_analyzer_eval(int);
>>         char  *strcat(char *restrict s1, const char *restrict s2);
>>         size_t strlen(const char *s);
>>
>>         void strcat_symbolic_src_length(char *src) {
>>           char dst[8] = "1234";
>>           strcat(dst, src);
>>           clang_analyzer_eval(strlen(dst) >= 4); //
>>         expected-warning{{TRUE}}
>>           (void)*dst;
>>         }
>>         ```
>>
>>         One would expect that the 'strlen(dst) >= 4' is TRUE, but it
>>         returns UNKOWN.
>>
>>         After doing a little bit of investigation - and debug prints
>>         - I came up with the following trace.
>>
>>         ---
>>
>>         # In the CStringChecker::evalStrcat creates a metadata symbol
>>         representing the cstring length of the region pointed by 'src'.
>>         created metadata symbol 'meta_$2{SymRegion{reg_$0<char *
>>         src>},unsigned long}'
>>
>>         # After the evalStrcat evaluated the call, the state contains
>>         the necessary mapping that the 'dst' points to a cstring
>>         which is 4 + meta$4 long.
>>         # Note that meta$4 represents the cstring length of the
>>         region pointed by 'src'.
>>         # So far so good. We know that resulting cstring length
>>         precisely.
>>         strcpy common END State.dump: "program_state": {
>>           [...]
>>           "checker_messages": [
>>             { "checker": "unix.cstring.CStringModeling", "messages": [
>>               "CString lengths:",
>>               "dst: (meta_$2{SymRegion{reg_$0<char * src>},unsigned
>>         long}) + 4U",
>>               "SymRegion{reg_$0<char * src>}:
>>         meta_$2{SymRegion{reg_$0<char * src>},unsigned long}",
>>               ""
>>             ]}
>>           ]
>>         }
>>
>>         # We mark all symbols as live in the cstring length map.
>>         # At least we think so...
>>         CStringChecker::checkLiveSymbols marks
>>         '(meta_$2{SymRegion{reg_$0<char * src>},unsigned long}) + 4U'
>>         in use
>>         CStringChecker::checkLiveSymbols marks
>>         'meta_$2{SymRegion{reg_$0<char * src>},unsigned long}' in use
>>         CStringChecker::checkLiveSymbols marks
>>         'meta_$2{SymRegion{reg_$0<char * src>},unsigned long}' in use
>>
>>         # Even we marked the given symbols in use, we still removes
>>         them for some reason...
>>         CStringChecker::checkDeadSymbols finds the
>>         '(meta_$2{SymRegion{reg_$0<char * src>},unsigned long}) + 4U'
>>         as dead; so removes the mapping from 'dst'
>>         CStringChecker::checkDeadSymbols finds the
>>         'meta_$2{SymRegion{reg_$0<char * src>},unsigned long}' as
>>         dead; so removes the mapping from 'SymRegion{reg_$0<char * src>}'
>>
>>         # Now that state does not contain the cstring length of the
>>         region pointed by 'dst'.
>>
>>         ---
>>
>>         Further investigation showed that even if I would visit all
>>         the sub SymExprs looking for SymbolMetadata would not help.
>>         SymbolReaper::isDead would still show that the
>>         '(meta_$2{SymRegion{reg_$0<char * src>},unsigned long}) + 4U'
>>         SymExpr is dead.
>>
>>         ---
>>
>>         How can we preserve such metadata information?
>>
>>         ---
>>
>>         You can also reproduce this following these steps:
>>
>>         Apply the add-debug-prints.patch on top of
>>         7f1556f292ccfd80c4ffa986d5b849f915e5cd82 "Fix typo:
>>         s/epomymous/eponymous/ NFC".
>>         Analyze the 'test.c' file using this command:
>>         ./bin/clang -cc1 -internal-isystem lib/clang/12.0.0/include
>>         -nostdsysteminc -analyze -analyzer-constraints=range
>>         -setup-static-analyzer
>>         -analyzer-checker=unix.cstring,alpha.unix.cstring,alpha.security.taint,debug.ExprInspection
>>         -analyzer-config eagerly-assume=false test.c
>>
>>         Balázs Benics <benicsbalazs at gmail.com
>>         <mailto:benicsbalazs at gmail.com>> ezt írta (időpont: 2020.
>>         aug. 5., Sze, 10:29):
>>
>>             Why does the SymbolReaper::markInUse only work on
>>             SymbolMetadata symbols?
>>
>>             void SymbolReaper::markInUse(SymbolRef sym) {
>>                if (isa<SymbolMetadata>(sym))
>>                  MetadataInUse.insert(sym); }
>>
>>             I think it is flawed if the Symbol is a SymIntExpr
>>             holding an expression tree referring to SymbolMetadata
>>             symbols. In such case, those symbols would not be kept
>>             alive - causing some confusion on the checker developers'
>>             side and potentially losing some information about the
>>             analysis.
>>
>>             Should we walk the expression tree instead of the current
>>             implementation?
>>             What performance impact should we expect by doing so?
>>
>>             Any ideas?
>>
>>             Balazs.
>>
>>         _______________________________________________
>>         cfe-dev mailing list
>>         cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>>         https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>         <https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
>>
>>
>>     _______________________________________________
>>     cfe-dev mailing list
>>     cfe-dev at lists.llvm.org  <mailto:cfe-dev at lists.llvm.org>
>>     https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev  <https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200806/cf7a9382/attachment-0001.html>


More information about the cfe-dev mailing list