<div dir="ltr"><div>Gábor<br></div><div>> How much work would it be to prototype keeping these expressions alive and measuring performance and memory implications?</div><div>I'm not sure, since I'm not really experienced in this liveness stuff. I will try it.</div><div><br></div><div>Artem<br></div><div>> It sounds to me
    as if putting metadata symbols into the live set directly would have
    worked just fine.</div><div>Ehm, I' not sure about this.</div><div>If you look my example closely, you can note that the <span style="font-family:monospace">CStringChecker</span> maps directly the <span style="font-family:monospace">SymRegion{reg_$0<char * src>}</span> to the <span style="font-family:monospace">meta_$2{SymRegion{reg_$0<char * src>},unsigned long}</span> symbol.<br></div><div>So the <span style="font-family:monospace">SymbolReaper::markInUse</span> will in fact place that <span style="font-family:monospace">meta_$2</span> symbol in the Live set.</div><div>However later, when you query the <span style="font-family:monospace">SymbolReaper</span> if the mentioned <span style="font-family:monospace">meta_$2</span> symbol is dead or not, it will answer you that it is <b>dead</b>.</div><div><br></div><div>I'm quoting the related trace log: <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><span style="font-family:monospace">[...]<br><b>CStringChecker::</b><b>checkLiveSymbols marks 'meta_$2{SymRegion{reg_$0<char * src>},unsigned long}' in use</b><br><br># Even we marked the given symbols in use, we still removes them for some reason...<br>CStringChecker::checkDeadSymbols finds the '(meta_$2{SymRegion{reg_$0<char * src>},unsigned long}) + 4U' as dead; so removes the mapping from 'dst'<br><b>CStringChecker::</b><b>checkDeadSymbols
 finds the 'meta_$2{SymRegion{reg_$0<char * src>},unsigned long}' 
as dead; so removes the mapping from 'SymRegion{reg_$0<char * 
src>}'</b></span></div></blockquote><div><br></div><div>Why does the <span style="font-family:monospace">ExprEngine</span> conjure a return symbol, if an <span style="font-family:monospace">evalCall</span> already evaluated the call?</div><div><br></div><div>Artem</div><div>> See how RegionStore does that within
    `ScanReachableSymbols::scan(const SymExpr *)`</div><div>I managed to implement a metadata collector visitor using the new SymExprVisitor infrastructure, in just a couple of lines of code. I was amazed by that :)<br></div><div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Artem Dergachev <<a href="mailto:noqnoqneo@gmail.com">noqnoqneo@gmail.com</a>> ezt írta (időpont: 2020. aug. 6., Cs, 19:04):<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
  
    
  
  <div>
    I- i- i was about to reply to that!<br>
    <br>
    I don't know why metadata-in-use is a thing at all. It sounds to me
    as if putting metadata symbols into the live set directly would have
    worked just fine. If you find any interesting counterexamples please
    let me know.<br>
    <br>
    Apart from that, indeed, the correct way to implement
    checkLiveSymbols when you're tracking arbitrary symbols is to
    iterate over these arbitrary symbols and mark all sub-symbols as
    live. See how RegionStore does that within
    `ScanReachableSymbols::scan(const SymExpr *)`. I.e., the following
    example works correctly and i expect CStringChecker to work
    similarly:<br>
    <br>
    ```<br>
    int conjure();<br>
    <br>
    int foo() {<br>
      int x = conjure();<br>
      clang_analyzer_warnOnDeadSymbol(x);<br>
      return x + 1;<br>
    }<br>
    <br>
    void bar() {<br>
      int y = foo(); // At this point `conj_$2` is no longer directly
    present in the state; only `conj_$2 + 1` is.<br>
      (void)y;<br>
    } // But despite that, `conj_$2` only dies here.<br>
    ```<br>
    <br>
    <br>
    <div>On 8/6/20 11:35 AM, Gábor Horváth via
      cfe-dev wrote:<br>
    </div>
    <blockquote type="cite">
      
      <div dir="ltr">
        <div>+Artem</div>
        <div><br>
        </div>
        <div>It would be great if the analyzer could reason about code
          like that. I think Artem is the most competent in these
          liveness related problems.</div>
        <div>Aside from performance, I do not see any downside for
          keeping the whole symbolic expression alive after markInUse
          was called on it (hopefully Artem corrects me if I'm wrong).</div>
        <div>But mainly due to constraint solver limitations it might
          not make sense to keep arbitrarily complex expressions alive.</div>
        <div><br>
          <a class="gmail_plusreply" id="gmail-m_-2871203687350369529plusReplyChip-1" href="mailto:benicsbalazs@gmail.com" target="_blank">@Balázs Benics</a></div>
        <div>How much work would it be to prototype keeping these
          expressions alive and measuring performance and memory
          implications?</div>
        <div><br>
        </div>
      </div>
      <br>
      <div class="gmail_quote">
        <div dir="ltr" class="gmail_attr">On Wed, 5 Aug 2020 at 16:17,
          Balázs Benics via cfe-dev <<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a>>
          wrote:<br>
        </div>
        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
          <div dir="ltr">If you are really curious, here is some
            context.<br>
            <div><br>
              Imagine the following code (<span style="font-family:monospace">test.c</span>):<br>
              ```<br>
              <span style="font-family:monospace">typedef
                typeof(sizeof(int)) size_t;<br>
                <br>
                void clang_analyzer_eval(int);<br>
                char  *strcat(char *restrict s1, const char *restrict
                s2);<br>
                size_t strlen(const char *s);<br>
                <br>
                void strcat_symbolic_src_length(char *src) {<br>
                  char dst[8] = "1234";<br>
                  strcat(dst, src);<br>
                  clang_analyzer_eval(strlen(dst) >= 4); //
                expected-warning{{TRUE}}<br>
                  (void)*dst;<br>
                }</span><br>
              ```<br>
              <br>
              One would expect that the '<span style="font-family:monospace">strlen(dst) >= 4</span>'
              is <span style="font-family:monospace">TRUE</span>, but
              it returns <span style="font-family:monospace">UNKOWN</span>.<br>
              <br>
              After doing a little bit of investigation - and debug
              prints - I came up with the following trace.<br>
              <br>
              ---<br>
              <br>
              <span style="font-family:monospace"># In the
                CStringChecker::evalStrcat creates a metadata symbol
                representing the cstring length of the region pointed by
                'src'.<br>
                created metadata symbol
                'meta_$2{SymRegion{reg_$0<char * src>},unsigned
                long}'<br>
                <br>
                # After the evalStrcat evaluated the call, the state
                contains the necessary mapping that the 'dst' points to
                a cstring which is 4 + meta$4 long.<br>
                # Note that meta$4 represents the cstring length of the
                region pointed by 'src'.<br>
                # So far so good. We know that resulting cstring length
                precisely.<br>
                strcpy common END State.dump: "program_state": {<br>
                  [...]<br>
                  "checker_messages": [<br>
                    { "checker": "unix.cstring.CStringModeling",
                "messages": [<br>
                      "CString lengths:", <br>
                      "dst: (meta_$2{SymRegion{reg_$0<char *
                src>},unsigned long}) + 4U", <br>
                      "SymRegion{reg_$0<char * src>}:
                meta_$2{SymRegion{reg_$0<char * src>},unsigned
                long}", <br>
                      ""<br>
                    ]}<br>
                  ]<br>
                }<br>
                <br>
                # We mark all symbols as live in the cstring length map.<br>
                # At least we think so...<br>
                CStringChecker::checkLiveSymbols marks
                '(meta_$2{SymRegion{reg_$0<char * src>},unsigned
                long}) + 4U' in use<br>
                CStringChecker::checkLiveSymbols marks
                'meta_$2{SymRegion{reg_$0<char * src>},unsigned
                long}' in use<br>
                CStringChecker::checkLiveSymbols marks
                'meta_$2{SymRegion{reg_$0<char * src>},unsigned
                long}' in use<br>
                <br>
                # Even we marked the given symbols in use, we still
                removes them for some reason...<br>
                CStringChecker::checkDeadSymbols finds the
                '(meta_$2{SymRegion{reg_$0<char * src>},unsigned
                long}) + 4U' as dead; so removes the mapping from 'dst'<br>
                CStringChecker::checkDeadSymbols finds the
                'meta_$2{SymRegion{reg_$0<char * src>},unsigned
                long}' as dead; so removes the mapping from
                'SymRegion{reg_$0<char * src>}'<br>
                <br>
                # Now that state does not contain the cstring length of
                the region pointed by 'dst'.</span><br>
              <br>
              ---<br>
              <br>
              Further investigation showed that even if I would visit
              all the sub SymExprs looking for SymbolMetadata would not
              help.<br>
              SymbolReaper::isDead would still show that the
              '(meta_$2{SymRegion{reg_$0<char * src>},unsigned
              long}) + 4U' SymExpr is dead.<br>
              <br>
              ---<br>
              <br>
              How can we preserve such metadata information?<br>
              <br>
              ---<br>
              <br>
              You can also reproduce this following these steps:<br>
              <br>
              Apply the <span style="font-family:monospace">add-debug-prints.patch</span>
              on top of <span style="font-family:monospace">7f1556f292ccfd80c4ffa986d5b849f915e5cd82
                "Fix typo: s/epomymous/eponymous/ NFC"</span>.<br>
              Analyze the '<span style="font-family:monospace">test.c</span>'
              file using this command:<br>
              <span style="font-family:monospace">./bin/clang -cc1
                -internal-isystem lib/clang/12.0.0/include
                -nostdsysteminc -analyze -analyzer-constraints=range
                -setup-static-analyzer
-analyzer-checker=unix.cstring,alpha.unix.cstring,alpha.security.taint,debug.ExprInspection
                -analyzer-config eagerly-assume=false test.c</span><br>
            </div>
          </div>
          <br>
          <div class="gmail_quote">
            <div dir="ltr" class="gmail_attr">Balázs Benics <<a href="mailto:benicsbalazs@gmail.com" target="_blank">benicsbalazs@gmail.com</a>>
              ezt írta (időpont: 2020. aug. 5., Sze, 10:29):<br>
            </div>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
              <div dir="ltr">
                <div>Why does the SymbolReaper::markInUse only work on
                  SymbolMetadata symbols?</div>
                <div>
                  <pre style="background-color:rgb(43,43,43);color:rgb(169,183,198);font-family:"JetBrains Mono",monospace;font-size:9.8pt"><span style="color:rgb(204,120,50)">void </span><span style="color:rgb(181,182,227)">SymbolReaper</span>::<span style="color:rgb(255,198,109)">markInUse</span>(<span style="color:rgb(185,188,209)">SymbolRef </span>sym) {
  <span style="color:rgb(204,120,50)">if </span>(isa<<span style="color:rgb(181,182,227)">SymbolMetadata</span>>(sym))
    <span style="color:rgb(147,115,165)">MetadataInUse</span>.insert(sym)<span style="color:rgb(204,120,50)">;
</span>}</pre>
                </div>
                <div>I think it is flawed if the Symbol is a SymIntExpr
                  holding an expression tree referring to SymbolMetadata
                  symbols. In such case, those symbols would not be kept
                  alive - causing some confusion on the checker
                  developers' side and potentially losing some
                  information about the analysis.</div>
                <div><br>
                </div>
                <div>Should we walk the expression tree instead of the
                  current implementation?</div>
                <div>What performance impact should we expect by doing
                  so?</div>
                <div><br>
                </div>
                <div>Any ideas?</div>
                <div><br>
                </div>
                <div>Balazs.<br>
                </div>
              </div>
            </blockquote>
          </div>
          _______________________________________________<br>
          cfe-dev mailing list<br>
          <a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a><br>
          <a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a><br>
        </blockquote>
      </div>
      <br>
      <fieldset></fieldset>
      <pre>_______________________________________________
cfe-dev mailing list
<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a>
<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a>
</pre>
    </blockquote>
    <br>
  </div>

</blockquote></div>