<div dir="auto">Thanks for the advice.<div dir="auto">It sounds like a plan, Artem. </div><div dir="auto">I will make some experiments :)</div><div dir="auto"><br></div><div dir="auto">We will see where it goes. </div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Aug 20, 2020, 00:02 Artem Dergachev <<a href="mailto:noqnoqneo@gmail.com">noqnoqneo@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
  
    
  
  <div>
    <br>
    <br>
    <div>On 8/19/20 2:18 PM, Artem Dergachev
      wrote:<br>
    </div>
    <blockquote type="cite">
      
      Gabor, we've talked about this two days ago >_> +Nithin
      because we're facing the same decision with the smart pointer
      checker to which raw pointer value would make a lot of sense as
      SymbolMetadata.<br>
      <br>
      The way SymbolMetadata used in CStringChecker always bothered me.
      It *is* used *as if* it's SymbolConjured: it represents an unknown
      value of an expression of a specific type. As such it doesn't do
      much more than preserve its identity for debugging purposes. We
      indeed don't need another SymbolConjured.<br>
      <br>
      I believe we should remove SymbolConjured's identity elements from
      SymbolMetadata and instead make it work more like
      SymbolRegionValue works in RegionStore. Namely:<br>
      <br>
      1. Remove statement, location context, block count from
      SymbolMetadata's identity. Let it solely depend on the region.<br>
      <br>
      2. Get rid of the metadata-in-use set. From now on SymbolMetadata,
      like SymbolRegionValue, is live whenever its region is live.<br>
      <br>
      3. When strlen(R) is used for the first time on a region R,
      produce $meta<size_t R> as the answer, but *do not store* it
      in the string length map. We don't need to change the state
      because the state didn't in fact change. On subsequent strlen(R)
      calls we'll simply return the same symbol (*because* the map will
      remain empty), until the string is changed.<br>
      <br>
      4. If the string is mutated, record its length as usual. But if
      the string is invalidated (or the new length is completely
      unknown), *do not remove* the length from the state; instead, set
      it to SymbolConjured. Only remove it from the map when the region
      dies.<br>
      <br>
      5. Keep string length symbols alive as long as they're in the map.<br>
      <br>
      This model is obviously correct because it mimics RegionStore
      which is obviously correct (i mean, it's obviously broken beyond
      repair, but *this* part seems to be correct). In particular, this
      model doesn't have any obvious flaws that you point out in this
      thread. It also makes sure that the identity of the state
      correctly reflects mutations in the state. I think it's very much
      superior to the existing model. This also more or less matches my
      belief of "everything is a Store" that i wanted to push on a few
      years ago (though that one was more radical; back then i wanted to
      make a "shadow" region for a string and use the actual
      SymbolRegionValue of that shadow region instead of the
      SymbolMetadata-as-suggested-in-this-mail).<br>
      <br>
      Balasz, are you willing to implement something like that?<br>
    </blockquote>
    <br>
    *Balázs.<br>
    Sry!!<br>
    <br>
    <blockquote type="cite"> <br>
      <br>
      <div>On 8/19/20 7:59 AM, Gábor Horváth
        wrote:<br>
      </div>
      <blockquote type="cite">
        
        <div dir="auto">
          <div>Hi!
            <div dir="auto"><br>
            </div>
            <div dir="auto">Since the metadata symbol describes a
              property of the region I think in this example it might
              make sense to duplicate the metadata symbol for dst. But I
              can also imagine code using the length of a dead string
              which would be nice to support. </div>
            <div dir="auto"><br>
            </div>
            <div dir="auto">I also wonder what is the advantage of not
              using a conjured symbol. </div>
            <div dir="auto"><br>
            </div>
            <div dir="auto">Regards,</div>
            <div dir="auto">Gabor</div>
            <br>
            <br>
            <div class="gmail_quote">
              <div dir="ltr" class="gmail_attr">On Wed, Aug 19, 2020,
                3:08 PM Balázs Benics <<a href="mailto:benicsbalazs@gmail.com" target="_blank" rel="noreferrer">benicsbalazs@gmail.com</a>>
                wrote:<br>
              </div>
              <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                <div dir="ltr">
                  <div>It turns out that if the region of `src` is
                    alive, then the required metadata symbol will be
                    kept alive as well.</div>
                  <div>Here is the modified example:</div>
                  <div><span style="font-family:monospace">void
                      strcat_symbolic_src_length(char *src) {<br>
                        char dst[8] = "1234";<br>
                        strcat(dst, src);</span></div>
                  <div><span style="font-family:monospace"> 
                      clang_analyzer_eval(strlen(dst) >= 4); //
                      expected-warning{{TRUE}}<br>
                        (void)*dst;<br>
                    </span></div>
                  <div><span style="font-family:monospace">  <b>(void)*src;</b>
                      // Now we keep the `src` alive, thus any metadata
                      symbols to that region will be alive as well at
                      the eval call.<br>
                    </span></div>
                  <div><span style="font-family:monospace">}</span><br>
                    ```</div>
                  <div><br>
                  </div>
                  <div>It seems slightly confusing to me that depending
                    on that `src` is used later or not, the
                    `clang_analyzer_eval(strlen(dst) >= 4)` will show
                    either `TRUE` or `UNKNOWN`.</div>
                  <div>I think it should always give `TRUE` as an
                    answer.<br>
                    <br>
                  </div>
                  <div>Note that a metadata symbol is alive only if
                    marked in use AND its region is also alive.</div>
                  <div>Without the `(void)*src;` the region of `src` is
                    dead, thus the symbol ($meta{src} + 4) representing
                    the cstring length of the region `dst` will be dead
                    too.</div>
                  <div><br>
                  </div>
                  <div>Since this problem was caused by the handling of
                    metadata symbols, shouldn't we use conjured ones
                    instead?</div>
                  <div>In that way, we would decouple the liveness of
                    the cstring length of a region and the region
                    itself.</div>
                  <div><br>
                  </div>
                  <div>What is the use-case for using a metadata symbol
                    instead of a conjured one?<br>
                  </div>
                </div>
                <br>
                <div class="gmail_quote">
                  <div dir="ltr" class="gmail_attr">Artem Dergachev <<a href="mailto:noqnoqneo@gmail.com" rel="noreferrer noreferrer" target="_blank">noqnoqneo@gmail.com</a>>
                    ezt írta (időpont: 2020. aug. 7., P, 0:05):<br>
                  </div>
                  <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                    <div> <br>
                      <br>
                      <div>On 8/6/20 1:38 PM, Balázs Benics wrote:<br>
                      </div>
                      <blockquote type="cite">
                        <div dir="ltr">
                          <div>Gábor<br>
                          </div>
                          <div>> How much work would it be to
                            prototype keeping these expressions alive
                            and measuring performance and memory
                            implications?</div>
                          <div>I'm not sure, since I'm not really
                            experienced in this liveness stuff. I will
                            try it.</div>
                          <div><br>
                          </div>
                          <div>Artem<br>
                          </div>
                          <div>> It sounds to me as if putting
                            metadata symbols into the live set directly
                            would have worked just fine.</div>
                          <div>Ehm, I' not sure about this.</div>
                          <div>If you look my example closely, you can
                            note that the <span style="font-family:monospace">CStringChecker</span>
                            maps directly the <span style="font-family:monospace">SymRegion{reg_$0<char
                              * src>}</span> to the <span style="font-family:monospace">meta_$2{SymRegion{reg_$0<char
                              * src>},unsigned long}</span> symbol.<br>
                          </div>
                          <div>So the <span style="font-family:monospace">SymbolReaper::markInUse</span>
                            will in fact place that <span style="font-family:monospace">meta_$2</span>
                            symbol in the Live set.</div>
                          <div>However later, when you query the <span style="font-family:monospace">SymbolReaper</span>
                            if the mentioned <span style="font-family:monospace">meta_$2</span>
                            symbol is dead or not, it will answer you
                            that it is <b>dead</b>.</div>
                        </div>
                      </blockquote>
                      <br>
                      That's not how that works.<br>
                      <br>
                      markInUse() doesn't put anything into the live
                      set, it puts things into an auxiliary
                      "metadata-in-use" set: <a href="https://github.com/llvm/llvm-project/blob/llvmorg-11.0.0-rc1/clang/lib/StaticAnalyzer/Core/SymbolManager.cpp#L396" rel="noreferrer noreferrer" target="_blank">https://github.com/llvm/llvm-project/blob/llvmorg-11.0.0-rc1/clang/lib/StaticAnalyzer/Core/SymbolManager.cpp#L396</a><br>
                      <br>
                      On the other hand, isDead()/isLive() has to return
                      true if the symbol is present in the live set. In
                      fact, that's the first thing it checks for: <a href="https://github.com/llvm/llvm-project/blob/llvmorg-11.0.0-rc1/clang/lib/StaticAnalyzer/Core/SymbolManager.cpp#L437" rel="noreferrer noreferrer" target="_blank">https://github.com/llvm/llvm-project/blob/llvmorg-11.0.0-rc1/clang/lib/StaticAnalyzer/Core/SymbolManager.cpp#L437</a><br>
                      <br>
                      <blockquote type="cite">
                        <div dir="ltr">
                          <div><br>
                          </div>
                          <div>I'm quoting the related trace log: <br>
                          </div>
                          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                            <div><span style="font-family:monospace">[...]<br>
                                <b>CStringChecker::</b><b>checkLiveSymbols
                                  marks
                                  'meta_$2{SymRegion{reg_$0<char *
                                  src>},unsigned long}' in use</b><br>
                                <br>
                                # Even we marked the given symbols in
                                use, we still removes them for some
                                reason...<br>
                                CStringChecker::checkDeadSymbols finds
                                the '(meta_$2{SymRegion{reg_$0<char *
                                src>},unsigned long}) + 4U' as dead;
                                so removes the mapping from 'dst'<br>
                                <b>CStringChecker::</b><b>checkDeadSymbols
                                  finds the
                                  'meta_$2{SymRegion{reg_$0<char *
                                  src>},unsigned long}' as dead; so
                                  removes the mapping from
                                  'SymRegion{reg_$0<char * src>}'</b></span></div>
                          </blockquote>
                          <div><br>
                          </div>
                          <div>Why does the <span style="font-family:monospace">ExprEngine</span>
                            conjure a return symbol, if an <span style="font-family:monospace">evalCall</span>
                            already evaluated the call?</div>
                          <div><br>
                          </div>
                          <div>Artem</div>
                          <div>> See how RegionStore does that within
                            `ScanReachableSymbols::scan(const SymExpr
                            *)`</div>
                          <div>I managed to implement a metadata
                            collector visitor using the new
                            SymExprVisitor infrastructure, in just a
                            couple of lines of code. I was amazed by
                            that :)<br>
                          </div>
                        </div>
                        <br>
                        <div class="gmail_quote">
                          <div dir="ltr" class="gmail_attr">Artem
                            Dergachev <<a href="mailto:noqnoqneo@gmail.com" rel="noreferrer noreferrer" target="_blank">noqnoqneo@gmail.com</a>>
                            ezt írta (időpont: 2020. aug. 6., Cs,
                            19:04):<br>
                          </div>
                          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                            <div> I- i- i was about to reply to that!<br>
                              <br>
                              I don't know why metadata-in-use is a
                              thing at all. It sounds to me as if
                              putting metadata symbols into the live set
                              directly would have worked just fine. If
                              you find any interesting counterexamples
                              please let me know.<br>
                              <br>
                              Apart from that, indeed, the correct way
                              to implement checkLiveSymbols when you're
                              tracking arbitrary symbols is to iterate
                              over these arbitrary symbols and mark all
                              sub-symbols as live. See how RegionStore
                              does that within
                              `ScanReachableSymbols::scan(const SymExpr
                              *)`. I.e., the following example works
                              correctly and i expect CStringChecker to
                              work similarly:<br>
                              <br>
                              ```<br>
                              int conjure();<br>
                              <br>
                              int foo() {<br>
                                int x = conjure();<br>
                                clang_analyzer_warnOnDeadSymbol(x);<br>
                                return x + 1;<br>
                              }<br>
                              <br>
                              void bar() {<br>
                                int y = foo(); // At this point
                              `conj_$2` is no longer directly present in
                              the state; only `conj_$2 + 1` is.<br>
                                (void)y;<br>
                              } // But despite that, `conj_$2` only dies
                              here.<br>
                              ```<br>
                              <br>
                              <br>
                              <div>On 8/6/20 11:35 AM, Gábor Horváth via
                                cfe-dev wrote:<br>
                              </div>
                              <blockquote type="cite">
                                <div dir="ltr">
                                  <div>+Artem</div>
                                  <div><br>
                                  </div>
                                  <div>It would be great if the analyzer
                                    could reason about code like that. I
                                    think Artem is the most competent in
                                    these liveness related problems.</div>
                                  <div>Aside from performance, I do not
                                    see any downside for keeping the
                                    whole symbolic expression alive
                                    after markInUse was called on it
                                    (hopefully Artem corrects me if I'm
                                    wrong).</div>
                                  <div>But mainly due to constraint
                                    solver limitations it might not make
                                    sense to keep arbitrarily complex
                                    expressions alive.</div>
                                  <div><br>
                                    <a class="gmail_plusreply" id="m_-5368288223173890362m_-4104045513335271565gmail-m_6490615832270949146gmail-m_-2871203687350369529plusReplyChip-1" href="mailto:benicsbalazs@gmail.com" rel="noreferrer noreferrer" target="_blank">@Balázs
                                      Benics</a></div>
                                  <div>How much work would it be to
                                    prototype keeping these expressions
                                    alive and measuring performance and
                                    memory implications?</div>
                                  <div><br>
                                  </div>
                                </div>
                                <br>
                                <div class="gmail_quote">
                                  <div dir="ltr" class="gmail_attr">On
                                    Wed, 5 Aug 2020 at 16:17, Balázs
                                    Benics via cfe-dev <<a href="mailto:cfe-dev@lists.llvm.org" rel="noreferrer noreferrer" target="_blank">cfe-dev@lists.llvm.org</a>>
                                    wrote:<br>
                                  </div>
                                  <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                    <div dir="ltr">If you are really
                                      curious, here is some context.<br>
                                      <div><br>
                                        Imagine the following code (<span style="font-family:monospace">test.c</span>):<br>
                                        ```<br>
                                        <span style="font-family:monospace">typedef
                                          typeof(sizeof(int)) size_t;<br>
                                          <br>
                                          void clang_analyzer_eval(int);<br>
                                          char  *strcat(char *restrict
                                          s1, const char *restrict s2);<br>
                                          size_t strlen(const char *s);<br>
                                          <br>
                                          void
                                          strcat_symbolic_src_length(char
                                          *src) {<br>
                                            char dst[8] = "1234";<br>
                                            strcat(dst, src);<br>
                                           
                                          clang_analyzer_eval(strlen(dst)
                                          >= 4); //
                                          expected-warning{{TRUE}}<br>
                                            (void)*dst;<br>
                                          }</span><br>
                                        ```<br>
                                        <br>
                                        One would expect that the '<span style="font-family:monospace">strlen(dst)
                                          >= 4</span>' is <span style="font-family:monospace">TRUE</span>,
                                        but it returns <span style="font-family:monospace">UNKOWN</span>.<br>
                                        <br>
                                        After doing a little bit of
                                        investigation - and debug prints
                                        - I came up with the following
                                        trace.<br>
                                        <br>
                                        ---<br>
                                        <br>
                                        <span style="font-family:monospace">#
                                          In the
                                          CStringChecker::evalStrcat
                                          creates a metadata symbol
                                          representing the cstring
                                          length of the region pointed
                                          by 'src'.<br>
                                          created metadata symbol
                                          'meta_$2{SymRegion{reg_$0<char
                                          * src>},unsigned long}'<br>
                                          <br>
                                          # After the evalStrcat
                                          evaluated the call, the state
                                          contains the necessary mapping
                                          that the 'dst' points to a
                                          cstring which is 4 + meta$4
                                          long.<br>
                                          # Note that meta$4 represents
                                          the cstring length of the
                                          region pointed by 'src'.<br>
                                          # So far so good. We know that
                                          resulting cstring length
                                          precisely.<br>
                                          strcpy common END State.dump:
                                          "program_state": {<br>
                                            [...]<br>
                                            "checker_messages": [<br>
                                              { "checker":
                                          "unix.cstring.CStringModeling",
                                          "messages": [<br>
                                                "CString lengths:", <br>
                                                "dst:
                                          (meta_$2{SymRegion{reg_$0<char
                                          * src>},unsigned long}) +
                                          4U", <br>
                                               
                                          "SymRegion{reg_$0<char *
                                          src>}:
                                          meta_$2{SymRegion{reg_$0<char
                                          * src>},unsigned long}", <br>
                                                ""<br>
                                              ]}<br>
                                            ]<br>
                                          }<br>
                                          <br>
                                          # We mark all symbols as live
                                          in the cstring length map.<br>
                                          # At least we think so...<br>
CStringChecker::checkLiveSymbols marks
                                          '(meta_$2{SymRegion{reg_$0<char
                                          * src>},unsigned long}) +
                                          4U' in use<br>
CStringChecker::checkLiveSymbols marks 'meta_$2{SymRegion{reg_$0<char
                                          * src>},unsigned long}' in
                                          use<br>
CStringChecker::checkLiveSymbols marks 'meta_$2{SymRegion{reg_$0<char
                                          * src>},unsigned long}' in
                                          use<br>
                                          <br>
                                          # Even we marked the given
                                          symbols in use, we still
                                          removes them for some
                                          reason...<br>
CStringChecker::checkDeadSymbols finds the
                                          '(meta_$2{SymRegion{reg_$0<char
                                          * src>},unsigned long}) +
                                          4U' as dead; so removes the
                                          mapping from 'dst'<br>
CStringChecker::checkDeadSymbols finds the
                                          'meta_$2{SymRegion{reg_$0<char
                                          * src>},unsigned long}' as
                                          dead; so removes the mapping
                                          from 'SymRegion{reg_$0<char
                                          * src>}'<br>
                                          <br>
                                          # Now that state does not
                                          contain the cstring length of
                                          the region pointed by 'dst'.</span><br>
                                        <br>
                                        ---<br>
                                        <br>
                                        Further investigation showed
                                        that even if I would visit all
                                        the sub SymExprs looking for
                                        SymbolMetadata would not help.<br>
                                        SymbolReaper::isDead would still
                                        show that the
                                        '(meta_$2{SymRegion{reg_$0<char
                                        * src>},unsigned long}) + 4U'
                                        SymExpr is dead.<br>
                                        <br>
                                        ---<br>
                                        <br>
                                        How can we preserve such
                                        metadata information?<br>
                                        <br>
                                        ---<br>
                                        <br>
                                        You can also reproduce this
                                        following these steps:<br>
                                        <br>
                                        Apply the <span style="font-family:monospace">add-debug-prints.patch</span>
                                        on top of <span style="font-family:monospace">7f1556f292ccfd80c4ffa986d5b849f915e5cd82
                                          "Fix typo:
                                          s/epomymous/eponymous/ NFC"</span>.<br>
                                        Analyze the '<span style="font-family:monospace">test.c</span>'
                                        file using this command:<br>
                                        <span style="font-family:monospace">./bin/clang
                                          -cc1 -internal-isystem
                                          lib/clang/12.0.0/include
                                          -nostdsysteminc -analyze
                                          -analyzer-constraints=range
                                          -setup-static-analyzer
-analyzer-checker=unix.cstring,alpha.unix.cstring,alpha.security.taint,debug.ExprInspection
                                          -analyzer-config
                                          eagerly-assume=false test.c</span><br>
                                      </div>
                                    </div>
                                    <br>
                                    <div class="gmail_quote">
                                      <div dir="ltr" class="gmail_attr">Balázs
                                        Benics <<a href="mailto:benicsbalazs@gmail.com" rel="noreferrer noreferrer" target="_blank">benicsbalazs@gmail.com</a>>
                                        ezt írta (időpont: 2020. aug.
                                        5., Sze, 10:29):<br>
                                      </div>
                                      <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                        <div dir="ltr">
                                          <div>Why does the
                                            SymbolReaper::markInUse only
                                            work on SymbolMetadata
                                            symbols?</div>
                                          <div>
                                            <pre style="background-color:rgb(43,43,43);color:rgb(169,183,198);font-family:"JetBrains Mono",monospace;font-size:9.8pt"><span style="color:rgb(204,120,50)">void </span><span style="color:rgb(181,182,227)">SymbolReaper</span>::<span style="color:rgb(255,198,109)">markInUse</span>(<span style="color:rgb(185,188,209)">SymbolRef </span>sym) {
  <span style="color:rgb(204,120,50)">if </span>(isa<<span style="color:rgb(181,182,227)">SymbolMetadata</span>>(sym))
    <span style="color:rgb(147,115,165)">MetadataInUse</span>.insert(sym)<span style="color:rgb(204,120,50)">;
</span>}</pre>
                                          </div>
                                          <div>I think it is flawed if
                                            the Symbol is a SymIntExpr
                                            holding an expression tree
                                            referring to SymbolMetadata
                                            symbols. In such case, those
                                            symbols would not be kept
                                            alive - causing some
                                            confusion on the checker
                                            developers' side and
                                            potentially losing some
                                            information about the
                                            analysis.</div>
                                          <div><br>
                                          </div>
                                          <div>Should we walk the
                                            expression tree instead of
                                            the current implementation?</div>
                                          <div>What performance impact
                                            should we expect by doing
                                            so?</div>
                                          <div><br>
                                          </div>
                                          <div>Any ideas?</div>
                                          <div><br>
                                          </div>
                                          <div>Balazs.<br>
                                          </div>
                                        </div>
                                      </blockquote>
                                    </div>
_______________________________________________<br>
                                    cfe-dev mailing list<br>
                                    <a href="mailto:cfe-dev@lists.llvm.org" rel="noreferrer noreferrer" target="_blank">cfe-dev@lists.llvm.org</a><br>
                                    <a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" rel="noreferrer noreferrer noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a><br>
                                  </blockquote>
                                </div>
                                <br>
                                <fieldset></fieldset>
                                <pre>_______________________________________________
cfe-dev mailing list
<a href="mailto:cfe-dev@lists.llvm.org" rel="noreferrer noreferrer" target="_blank">cfe-dev@lists.llvm.org</a>
<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" rel="noreferrer noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a>
</pre>
                              </blockquote>
                              <br>
                            </div>
                          </blockquote>
                        </div>
                      </blockquote>
                      <br>
                    </div>
                  </blockquote>
                </div>
              </blockquote>
            </div>
          </div>
        </div>
      </blockquote>
      <br>
    </blockquote>
    <br>
  </div>

</blockquote></div>