<div dir="auto"><div>Hi!<div dir="auto"><br></div><div dir="auto">Since the metadata symbol describes a property of the region I think in this example it might make sense to duplicate the metadata symbol for dst. But I can also imagine code using the length of a dead string which would be nice to support. </div><div dir="auto"><br></div><div dir="auto">I also wonder what is the advantage of not using a conjured symbol. </div><div dir="auto"><br></div><div dir="auto">Regards,</div><div dir="auto">Gabor</div><br><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Aug 19, 2020, 3:08 PM Balázs Benics <<a href="mailto:benicsbalazs@gmail.com">benicsbalazs@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>It turns out that if the region of `src` is alive, then the required metadata symbol will be kept alive as well.</div><div>Here is the modified example:</div><div><span style="font-family:monospace">void strcat_symbolic_src_length(char *src) {<br>  char dst[8] = "1234";<br>  strcat(dst, src);</span></div><div><span style="font-family:monospace">  clang_analyzer_eval(strlen(dst) >= 4); // expected-warning{{TRUE}}<br>  (void)*dst;<br></span></div><div><span style="font-family:monospace">  <b>(void)*src;</b> // Now we keep the `src` alive, thus any metadata symbols to that region will be alive as well at the eval call.<br></span></div><div><span style="font-family:monospace">}</span><br>```</div><div><br></div><div>It seems slightly confusing to me that depending on that `src` is used later or not, the `clang_analyzer_eval(strlen(dst) >= 4)` will show either `TRUE` or `UNKNOWN`.</div><div>I think it should always give `TRUE` as an answer.<br><br></div><div>Note that a metadata symbol is alive only if marked in use AND its region is also alive.</div><div>Without the `(void)*src;` the region of `src` is dead, thus the symbol ($meta{src} + 4) representing the cstring length of the region `dst` will be dead too.</div><div><br></div><div>Since this problem was caused by the handling of metadata symbols, shouldn't we use conjured ones instead?</div><div>In that way, we would decouple the liveness of the cstring length of a region and the region itself.</div><div><br></div><div>What is the use-case for using a metadata symbol instead of a conjured one?<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Artem Dergachev <<a href="mailto:noqnoqneo@gmail.com" target="_blank" rel="noreferrer">noqnoqneo@gmail.com</a>> ezt írta (időpont: 2020. aug. 7., P, 0:05):<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
  
    
  
  <div>
    <br>
    <br>
    <div>On 8/6/20 1:38 PM, Balázs Benics wrote:<br>
    </div>
    <blockquote type="cite">
      
      <div dir="ltr">
        <div>Gábor<br>
        </div>
        <div>> How much work would it be to prototype keeping these
          expressions alive and measuring performance and memory
          implications?</div>
        <div>I'm not sure, since I'm not really experienced in this
          liveness stuff. I will try it.</div>
        <div><br>
        </div>
        <div>Artem<br>
        </div>
        <div>> It sounds to me as if putting metadata symbols into
          the live set directly would have worked just fine.</div>
        <div>Ehm, I' not sure about this.</div>
        <div>If you look my example closely, you can note that the <span style="font-family:monospace">CStringChecker</span> maps
          directly the <span style="font-family:monospace">SymRegion{reg_$0<char
            * src>}</span> to the <span style="font-family:monospace">meta_$2{SymRegion{reg_$0<char
            * src>},unsigned long}</span> symbol.<br>
        </div>
        <div>So the <span style="font-family:monospace">SymbolReaper::markInUse</span>
          will in fact place that <span style="font-family:monospace">meta_$2</span>
          symbol in the Live set.</div>
        <div>However later, when you query the <span style="font-family:monospace">SymbolReaper</span> if the
          mentioned <span style="font-family:monospace">meta_$2</span>
          symbol is dead or not, it will answer you that it is <b>dead</b>.</div>
      </div>
    </blockquote>
    <br>
    That's not how that works.<br>
    <br>
    markInUse() doesn't put anything into the live set, it puts things
    into an auxiliary "metadata-in-use" set:
<a href="https://github.com/llvm/llvm-project/blob/llvmorg-11.0.0-rc1/clang/lib/StaticAnalyzer/Core/SymbolManager.cpp#L396" target="_blank" rel="noreferrer">https://github.com/llvm/llvm-project/blob/llvmorg-11.0.0-rc1/clang/lib/StaticAnalyzer/Core/SymbolManager.cpp#L396</a><br>
    <br>
    On the other hand, isDead()/isLive() has to return true if the
    symbol is present in the live set. In fact, that's the first thing
    it checks for:
<a href="https://github.com/llvm/llvm-project/blob/llvmorg-11.0.0-rc1/clang/lib/StaticAnalyzer/Core/SymbolManager.cpp#L437" target="_blank" rel="noreferrer">https://github.com/llvm/llvm-project/blob/llvmorg-11.0.0-rc1/clang/lib/StaticAnalyzer/Core/SymbolManager.cpp#L437</a><br>
    <br>
    <blockquote type="cite">
      <div dir="ltr">
        <div><br>
        </div>
        <div>I'm quoting the related trace log: <br>
        </div>
        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
          <div><span style="font-family:monospace">[...]<br>
              <b>CStringChecker::</b><b>checkLiveSymbols marks
                'meta_$2{SymRegion{reg_$0<char * src>},unsigned
                long}' in use</b><br>
              <br>
              # Even we marked the given symbols in use, we still
              removes them for some reason...<br>
              CStringChecker::checkDeadSymbols finds the
              '(meta_$2{SymRegion{reg_$0<char * src>},unsigned
              long}) + 4U' as dead; so removes the mapping from 'dst'<br>
              <b>CStringChecker::</b><b>checkDeadSymbols finds the
                'meta_$2{SymRegion{reg_$0<char * src>},unsigned
                long}' as dead; so removes the mapping from
                'SymRegion{reg_$0<char * src>}'</b></span></div>
        </blockquote>
        <div><br>
        </div>
        <div>Why does the <span style="font-family:monospace">ExprEngine</span>
          conjure a return symbol, if an <span style="font-family:monospace">evalCall</span> already
          evaluated the call?</div>
        <div><br>
        </div>
        <div>Artem</div>
        <div>> See how RegionStore does that within
          `ScanReachableSymbols::scan(const SymExpr *)`</div>
        <div>I managed to implement a metadata collector visitor using
          the new SymExprVisitor infrastructure, in just a couple of
          lines of code. I was amazed by that :)<br>
        </div>
      </div>
      <br>
      <div class="gmail_quote">
        <div dir="ltr" class="gmail_attr">Artem Dergachev <<a href="mailto:noqnoqneo@gmail.com" target="_blank" rel="noreferrer">noqnoqneo@gmail.com</a>>
          ezt írta (időpont: 2020. aug. 6., Cs, 19:04):<br>
        </div>
        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
          <div> I- i- i was about to reply to that!<br>
            <br>
            I don't know why metadata-in-use is a thing at all. It
            sounds to me as if putting metadata symbols into the live
            set directly would have worked just fine. If you find any
            interesting counterexamples please let me know.<br>
            <br>
            Apart from that, indeed, the correct way to implement
            checkLiveSymbols when you're tracking arbitrary symbols is
            to iterate over these arbitrary symbols and mark all
            sub-symbols as live. See how RegionStore does that within
            `ScanReachableSymbols::scan(const SymExpr *)`. I.e., the
            following example works correctly and i expect
            CStringChecker to work similarly:<br>
            <br>
            ```<br>
            int conjure();<br>
            <br>
            int foo() {<br>
              int x = conjure();<br>
              clang_analyzer_warnOnDeadSymbol(x);<br>
              return x + 1;<br>
            }<br>
            <br>
            void bar() {<br>
              int y = foo(); // At this point `conj_$2` is no longer
            directly present in the state; only `conj_$2 + 1` is.<br>
              (void)y;<br>
            } // But despite that, `conj_$2` only dies here.<br>
            ```<br>
            <br>
            <br>
            <div>On 8/6/20 11:35 AM, Gábor Horváth via cfe-dev wrote:<br>
            </div>
            <blockquote type="cite">
              <div dir="ltr">
                <div>+Artem</div>
                <div><br>
                </div>
                <div>It would be great if the analyzer could reason
                  about code like that. I think Artem is the most
                  competent in these liveness related problems.</div>
                <div>Aside from performance, I do not see any downside
                  for keeping the whole symbolic expression alive after
                  markInUse was called on it (hopefully Artem corrects
                  me if I'm wrong).</div>
                <div>But mainly due to constraint solver limitations it
                  might not make sense to keep arbitrarily complex
                  expressions alive.</div>
                <div><br>
                  <a class="gmail_plusreply" id="m_-4104045513335271565gmail-m_6490615832270949146gmail-m_-2871203687350369529plusReplyChip-1" href="mailto:benicsbalazs@gmail.com" target="_blank" rel="noreferrer">@Balázs Benics</a></div>
                <div>How much work would it be to prototype keeping
                  these expressions alive and measuring performance and
                  memory implications?</div>
                <div><br>
                </div>
              </div>
              <br>
              <div class="gmail_quote">
                <div dir="ltr" class="gmail_attr">On Wed, 5 Aug 2020 at
                  16:17, Balázs Benics via cfe-dev <<a href="mailto:cfe-dev@lists.llvm.org" target="_blank" rel="noreferrer">cfe-dev@lists.llvm.org</a>>
                  wrote:<br>
                </div>
                <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                  <div dir="ltr">If you are really curious, here is some
                    context.<br>
                    <div><br>
                      Imagine the following code (<span style="font-family:monospace">test.c</span>):<br>
                      ```<br>
                      <span style="font-family:monospace">typedef
                        typeof(sizeof(int)) size_t;<br>
                        <br>
                        void clang_analyzer_eval(int);<br>
                        char  *strcat(char *restrict s1, const char
                        *restrict s2);<br>
                        size_t strlen(const char *s);<br>
                        <br>
                        void strcat_symbolic_src_length(char *src) {<br>
                          char dst[8] = "1234";<br>
                          strcat(dst, src);<br>
                          clang_analyzer_eval(strlen(dst) >= 4); //
                        expected-warning{{TRUE}}<br>
                          (void)*dst;<br>
                        }</span><br>
                      ```<br>
                      <br>
                      One would expect that the '<span style="font-family:monospace">strlen(dst) >=
                        4</span>' is <span style="font-family:monospace">TRUE</span>, but
                      it returns <span style="font-family:monospace">UNKOWN</span>.<br>
                      <br>
                      After doing a little bit of investigation - and
                      debug prints - I came up with the following trace.<br>
                      <br>
                      ---<br>
                      <br>
                      <span style="font-family:monospace"># In the
                        CStringChecker::evalStrcat creates a metadata
                        symbol representing the cstring length of the
                        region pointed by 'src'.<br>
                        created metadata symbol
                        'meta_$2{SymRegion{reg_$0<char *
                        src>},unsigned long}'<br>
                        <br>
                        # After the evalStrcat evaluated the call, the
                        state contains the necessary mapping that the
                        'dst' points to a cstring which is 4 + meta$4
                        long.<br>
                        # Note that meta$4 represents the cstring length
                        of the region pointed by 'src'.<br>
                        # So far so good. We know that resulting cstring
                        length precisely.<br>
                        strcpy common END State.dump: "program_state": {<br>
                          [...]<br>
                          "checker_messages": [<br>
                            { "checker": "unix.cstring.CStringModeling",
                        "messages": [<br>
                              "CString lengths:", <br>
                              "dst: (meta_$2{SymRegion{reg_$0<char *
                        src>},unsigned long}) + 4U", <br>
                              "SymRegion{reg_$0<char * src>}:
                        meta_$2{SymRegion{reg_$0<char *
                        src>},unsigned long}", <br>
                              ""<br>
                            ]}<br>
                          ]<br>
                        }<br>
                        <br>
                        # We mark all symbols as live in the cstring
                        length map.<br>
                        # At least we think so...<br>
                        CStringChecker::checkLiveSymbols marks
                        '(meta_$2{SymRegion{reg_$0<char *
                        src>},unsigned long}) + 4U' in use<br>
                        CStringChecker::checkLiveSymbols marks
                        'meta_$2{SymRegion{reg_$0<char *
                        src>},unsigned long}' in use<br>
                        CStringChecker::checkLiveSymbols marks
                        'meta_$2{SymRegion{reg_$0<char *
                        src>},unsigned long}' in use<br>
                        <br>
                        # Even we marked the given symbols in use, we
                        still removes them for some reason...<br>
                        CStringChecker::checkDeadSymbols finds the
                        '(meta_$2{SymRegion{reg_$0<char *
                        src>},unsigned long}) + 4U' as dead; so
                        removes the mapping from 'dst'<br>
                        CStringChecker::checkDeadSymbols finds the
                        'meta_$2{SymRegion{reg_$0<char *
                        src>},unsigned long}' as dead; so removes the
                        mapping from 'SymRegion{reg_$0<char *
                        src>}'<br>
                        <br>
                        # Now that state does not contain the cstring
                        length of the region pointed by 'dst'.</span><br>
                      <br>
                      ---<br>
                      <br>
                      Further investigation showed that even if I would
                      visit all the sub SymExprs looking for
                      SymbolMetadata would not help.<br>
                      SymbolReaper::isDead would still show that the
                      '(meta_$2{SymRegion{reg_$0<char *
                      src>},unsigned long}) + 4U' SymExpr is dead.<br>
                      <br>
                      ---<br>
                      <br>
                      How can we preserve such metadata information?<br>
                      <br>
                      ---<br>
                      <br>
                      You can also reproduce this following these steps:<br>
                      <br>
                      Apply the <span style="font-family:monospace">add-debug-prints.patch</span>
                      on top of <span style="font-family:monospace">7f1556f292ccfd80c4ffa986d5b849f915e5cd82
                        "Fix typo: s/epomymous/eponymous/ NFC"</span>.<br>
                      Analyze the '<span style="font-family:monospace">test.c</span>'
                      file using this command:<br>
                      <span style="font-family:monospace">./bin/clang
                        -cc1 -internal-isystem lib/clang/12.0.0/include
                        -nostdsysteminc -analyze
                        -analyzer-constraints=range
                        -setup-static-analyzer
-analyzer-checker=unix.cstring,alpha.unix.cstring,alpha.security.taint,debug.ExprInspection
                        -analyzer-config eagerly-assume=false test.c</span><br>
                    </div>
                  </div>
                  <br>
                  <div class="gmail_quote">
                    <div dir="ltr" class="gmail_attr">Balázs Benics <<a href="mailto:benicsbalazs@gmail.com" target="_blank" rel="noreferrer">benicsbalazs@gmail.com</a>>
                      ezt írta (időpont: 2020. aug. 5., Sze, 10:29):<br>
                    </div>
                    <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                      <div dir="ltr">
                        <div>Why does the SymbolReaper::markInUse only
                          work on SymbolMetadata symbols?</div>
                        <div>
                          <pre style="background-color:rgb(43,43,43);color:rgb(169,183,198);font-family:"JetBrains Mono",monospace;font-size:9.8pt"><span style="color:rgb(204,120,50)">void </span><span style="color:rgb(181,182,227)">SymbolReaper</span>::<span style="color:rgb(255,198,109)">markInUse</span>(<span style="color:rgb(185,188,209)">SymbolRef </span>sym) {
  <span style="color:rgb(204,120,50)">if </span>(isa<<span style="color:rgb(181,182,227)">SymbolMetadata</span>>(sym))
    <span style="color:rgb(147,115,165)">MetadataInUse</span>.insert(sym)<span style="color:rgb(204,120,50)">;
</span>}</pre>
                        </div>
                        <div>I think it is flawed if the Symbol is a
                          SymIntExpr holding an expression tree
                          referring to SymbolMetadata symbols. In such
                          case, those symbols would not be kept alive -
                          causing some confusion on the checker
                          developers' side and potentially losing some
                          information about the analysis.</div>
                        <div><br>
                        </div>
                        <div>Should we walk the expression tree instead
                          of the current implementation?</div>
                        <div>What performance impact should we expect by
                          doing so?</div>
                        <div><br>
                        </div>
                        <div>Any ideas?</div>
                        <div><br>
                        </div>
                        <div>Balazs.<br>
                        </div>
                      </div>
                    </blockquote>
                  </div>
                  _______________________________________________<br>
                  cfe-dev mailing list<br>
                  <a href="mailto:cfe-dev@lists.llvm.org" target="_blank" rel="noreferrer">cfe-dev@lists.llvm.org</a><br>
                  <a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" rel="noreferrer noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a><br>
                </blockquote>
              </div>
              <br>
              <fieldset></fieldset>
              <pre>_______________________________________________
cfe-dev mailing list
<a href="mailto:cfe-dev@lists.llvm.org" target="_blank" rel="noreferrer">cfe-dev@lists.llvm.org</a>
<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" target="_blank" rel="noreferrer">https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a>
</pre>
            </blockquote>
            <br>
          </div>
        </blockquote>
      </div>
    </blockquote>
    <br>
  </div>

</blockquote></div>
</blockquote></div></div></div>