<div dir="ltr"><div></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>1. Remove statement, location context, block count from
    SymbolMetadata's identity. Let it solely depend on the region.<br>
    2. Get rid of the metadata-in-use set. From now on SymbolMetadata,
    like SymbolRegionValue, is live whenever its region is live.<br>
    3. When strlen(R) is used for the first time on a region R, produce
    $meta<size_t R> as the answer, but *do not store* it in the
    string length map. We don't need to change the state because the
    state didn't in fact change. On subsequent strlen(R) calls we'll
    simply return the same symbol (*because* the map will remain empty),
    until the string is changed.<br>
    4. If the string is mutated, record its length as usual. But if the
    string is invalidated (or the new length is completely unknown), *do
    not remove* the length from the state; instead, set it to
    SymbolConjured. Only remove it from the map when the region dies.<br>
    5. Keep string length symbols alive as long as they're in the map.</div></blockquote><div>Just for archive, <a href="https://reviews.llvm.org/D86445">[analyzer][RFC] Simplify MetadataSymbol representation and resolve a CStringChecker FIXME</a> implements these steps.<br></div><div><br></div><div>Thank you all for the precious time you put into this.</div><div>I really appreciate it.<br></div><div><br></div><div>; Balazs<br></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Artem Dergachev <<a href="mailto:noqnoqneo@gmail.com">noqnoqneo@gmail.com</a>> ezt írta (időpont: 2020. aug. 19., Sze, 21:18):<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
  
    
  
  <div>
    Gabor, we've talked about this two days ago >_> +Nithin
    because we're facing the same decision with the smart pointer
    checker to which raw pointer value would make a lot of sense as
    SymbolMetadata.<br>
    <br>
    The way SymbolMetadata used in CStringChecker always bothered me. It
    *is* used *as if* it's SymbolConjured: it represents an unknown
    value of an expression of a specific type. As such it doesn't do
    much more than preserve its identity for debugging purposes. We
    indeed don't need another SymbolConjured.<br>
    <br>
    I believe we should remove SymbolConjured's identity elements from
    SymbolMetadata and instead make it work more like SymbolRegionValue
    works in RegionStore. Namely:<br>
    <br>
    1. Remove statement, location context, block count from
    SymbolMetadata's identity. Let it solely depend on the region.<br>
    <br>
    2. Get rid of the metadata-in-use set. From now on SymbolMetadata,
    like SymbolRegionValue, is live whenever its region is live.<br>
    <br>
    3. When strlen(R) is used for the first time on a region R, produce
    $meta<size_t R> as the answer, but *do not store* it in the
    string length map. We don't need to change the state because the
    state didn't in fact change. On subsequent strlen(R) calls we'll
    simply return the same symbol (*because* the map will remain empty),
    until the string is changed.<br>
    <br>
    4. If the string is mutated, record its length as usual. But if the
    string is invalidated (or the new length is completely unknown), *do
    not remove* the length from the state; instead, set it to
    SymbolConjured. Only remove it from the map when the region dies.<br>
    <br>
    5. Keep string length symbols alive as long as they're in the map.<br>
    <br>
    This model is obviously correct because it mimics RegionStore which
    is obviously correct (i mean, it's obviously broken beyond repair,
    but *this* part seems to be correct). In particular, this model
    doesn't have any obvious flaws that you point out in this thread. It
    also makes sure that the identity of the state correctly reflects
    mutations in the state. I think it's very much superior to the
    existing model. This also more or less matches my belief of
    "everything is a Store" that i wanted to push on a few years ago
    (though that one was more radical; back then i wanted to make a
    "shadow" region for a string and use the actual SymbolRegionValue of
    that shadow region instead of the
    SymbolMetadata-as-suggested-in-this-mail).<br>
    <br>
    Balasz, are you willing to implement something like that?<br>
    <br>
    <br>
    <div>On 8/19/20 7:59 AM, Gábor Horváth
      wrote:<br>
    </div>
    <blockquote type="cite">
      
      <div dir="auto">
        <div>Hi!
          <div dir="auto"><br>
          </div>
          <div dir="auto">Since the metadata symbol describes a property
            of the region I think in this example it might make sense to
            duplicate the metadata symbol for dst. But I can also
            imagine code using the length of a dead string which would
            be nice to support. </div>
          <div dir="auto"><br>
          </div>
          <div dir="auto">I also wonder what is the advantage of not
            using a conjured symbol. </div>
          <div dir="auto"><br>
          </div>
          <div dir="auto">Regards,</div>
          <div dir="auto">Gabor</div>
          <br>
          <br>
          <div class="gmail_quote">
            <div dir="ltr" class="gmail_attr">On Wed, Aug 19, 2020, 3:08
              PM Balázs Benics <<a href="mailto:benicsbalazs@gmail.com" target="_blank">benicsbalazs@gmail.com</a>>
              wrote:<br>
            </div>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
              <div dir="ltr">
                <div>It turns out that if the region of `src` is alive,
                  then the required metadata symbol will be kept alive
                  as well.</div>
                <div>Here is the modified example:</div>
                <div><span style="font-family:monospace">void
                    strcat_symbolic_src_length(char *src) {<br>
                      char dst[8] = "1234";<br>
                      strcat(dst, src);</span></div>
                <div><span style="font-family:monospace"> 
                    clang_analyzer_eval(strlen(dst) >= 4); //
                    expected-warning{{TRUE}}<br>
                      (void)*dst;<br>
                  </span></div>
                <div><span style="font-family:monospace">  <b>(void)*src;</b>
                    // Now we keep the `src` alive, thus any metadata
                    symbols to that region will be alive as well at the
                    eval call.<br>
                  </span></div>
                <div><span style="font-family:monospace">}</span><br>
                  ```</div>
                <div><br>
                </div>
                <div>It seems slightly confusing to me that depending on
                  that `src` is used later or not, the
                  `clang_analyzer_eval(strlen(dst) >= 4)` will show
                  either `TRUE` or `UNKNOWN`.</div>
                <div>I think it should always give `TRUE` as an answer.<br>
                  <br>
                </div>
                <div>Note that a metadata symbol is alive only if marked
                  in use AND its region is also alive.</div>
                <div>Without the `(void)*src;` the region of `src` is
                  dead, thus the symbol ($meta{src} + 4) representing
                  the cstring length of the region `dst` will be dead
                  too.</div>
                <div><br>
                </div>
                <div>Since this problem was caused by the handling of
                  metadata symbols, shouldn't we use conjured ones
                  instead?</div>
                <div>In that way, we would decouple the liveness of the
                  cstring length of a region and the region itself.</div>
                <div><br>
                </div>
                <div>What is the use-case for using a metadata symbol
                  instead of a conjured one?<br>
                </div>
              </div>
              <br>
              <div class="gmail_quote">
                <div dir="ltr" class="gmail_attr">Artem Dergachev <<a href="mailto:noqnoqneo@gmail.com" rel="noreferrer" target="_blank">noqnoqneo@gmail.com</a>>
                  ezt írta (időpont: 2020. aug. 7., P, 0:05):<br>
                </div>
                <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                  <div> <br>
                    <br>
                    <div>On 8/6/20 1:38 PM, Balázs Benics wrote:<br>
                    </div>
                    <blockquote type="cite">
                      <div dir="ltr">
                        <div>Gábor<br>
                        </div>
                        <div>> How much work would it be to prototype
                          keeping these expressions alive and measuring
                          performance and memory implications?</div>
                        <div>I'm not sure, since I'm not really
                          experienced in this liveness stuff. I will try
                          it.</div>
                        <div><br>
                        </div>
                        <div>Artem<br>
                        </div>
                        <div>> It sounds to me as if putting metadata
                          symbols into the live set directly would have
                          worked just fine.</div>
                        <div>Ehm, I' not sure about this.</div>
                        <div>If you look my example closely, you can
                          note that the <span style="font-family:monospace">CStringChecker</span>
                          maps directly the <span style="font-family:monospace">SymRegion{reg_$0<char
                            * src>}</span> to the <span style="font-family:monospace">meta_$2{SymRegion{reg_$0<char
                            * src>},unsigned long}</span> symbol.<br>
                        </div>
                        <div>So the <span style="font-family:monospace">SymbolReaper::markInUse</span>
                          will in fact place that <span style="font-family:monospace">meta_$2</span>
                          symbol in the Live set.</div>
                        <div>However later, when you query the <span style="font-family:monospace">SymbolReaper</span>
                          if the mentioned <span style="font-family:monospace">meta_$2</span>
                          symbol is dead or not, it will answer you that
                          it is <b>dead</b>.</div>
                      </div>
                    </blockquote>
                    <br>
                    That's not how that works.<br>
                    <br>
                    markInUse() doesn't put anything into the live set,
                    it puts things into an auxiliary "metadata-in-use"
                    set:
                    <a href="https://github.com/llvm/llvm-project/blob/llvmorg-11.0.0-rc1/clang/lib/StaticAnalyzer/Core/SymbolManager.cpp#L396" rel="noreferrer" target="_blank">https://github.com/llvm/llvm-project/blob/llvmorg-11.0.0-rc1/clang/lib/StaticAnalyzer/Core/SymbolManager.cpp#L396</a><br>
                    <br>
                    On the other hand, isDead()/isLive() has to return
                    true if the symbol is present in the live set. In
                    fact, that's the first thing it checks for:
                    <a href="https://github.com/llvm/llvm-project/blob/llvmorg-11.0.0-rc1/clang/lib/StaticAnalyzer/Core/SymbolManager.cpp#L437" rel="noreferrer" target="_blank">https://github.com/llvm/llvm-project/blob/llvmorg-11.0.0-rc1/clang/lib/StaticAnalyzer/Core/SymbolManager.cpp#L437</a><br>
                    <br>
                    <blockquote type="cite">
                      <div dir="ltr">
                        <div><br>
                        </div>
                        <div>I'm quoting the related trace log: <br>
                        </div>
                        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                          <div><span style="font-family:monospace">[...]<br>
                              <b>CStringChecker::</b><b>checkLiveSymbols
                                marks 'meta_$2{SymRegion{reg_$0<char
                                * src>},unsigned long}' in use</b><br>
                              <br>
                              # Even we marked the given symbols in use,
                              we still removes them for some reason...<br>
                              CStringChecker::checkDeadSymbols finds the
                              '(meta_$2{SymRegion{reg_$0<char *
                              src>},unsigned long}) + 4U' as dead; so
                              removes the mapping from 'dst'<br>
                              <b>CStringChecker::</b><b>checkDeadSymbols
                                finds the
                                'meta_$2{SymRegion{reg_$0<char *
                                src>},unsigned long}' as dead; so
                                removes the mapping from
                                'SymRegion{reg_$0<char * src>}'</b></span></div>
                        </blockquote>
                        <div><br>
                        </div>
                        <div>Why does the <span style="font-family:monospace">ExprEngine</span>
                          conjure a return symbol, if an <span style="font-family:monospace">evalCall</span>
                          already evaluated the call?</div>
                        <div><br>
                        </div>
                        <div>Artem</div>
                        <div>> See how RegionStore does that within
                          `ScanReachableSymbols::scan(const SymExpr *)`</div>
                        <div>I managed to implement a metadata collector
                          visitor using the new SymExprVisitor
                          infrastructure, in just a couple of lines of
                          code. I was amazed by that :)<br>
                        </div>
                      </div>
                      <br>
                      <div class="gmail_quote">
                        <div dir="ltr" class="gmail_attr">Artem
                          Dergachev <<a href="mailto:noqnoqneo@gmail.com" rel="noreferrer" target="_blank">noqnoqneo@gmail.com</a>>
                          ezt írta (időpont: 2020. aug. 6., Cs, 19:04):<br>
                        </div>
                        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                          <div> I- i- i was about to reply to that!<br>
                            <br>
                            I don't know why metadata-in-use is a thing
                            at all. It sounds to me as if putting
                            metadata symbols into the live set directly
                            would have worked just fine. If you find any
                            interesting counterexamples please let me
                            know.<br>
                            <br>
                            Apart from that, indeed, the correct way to
                            implement checkLiveSymbols when you're
                            tracking arbitrary symbols is to iterate
                            over these arbitrary symbols and mark all
                            sub-symbols as live. See how RegionStore
                            does that within
                            `ScanReachableSymbols::scan(const SymExpr
                            *)`. I.e., the following example works
                            correctly and i expect CStringChecker to
                            work similarly:<br>
                            <br>
                            ```<br>
                            int conjure();<br>
                            <br>
                            int foo() {<br>
                              int x = conjure();<br>
                              clang_analyzer_warnOnDeadSymbol(x);<br>
                              return x + 1;<br>
                            }<br>
                            <br>
                            void bar() {<br>
                              int y = foo(); // At this point `conj_$2`
                            is no longer directly present in the state;
                            only `conj_$2 + 1` is.<br>
                              (void)y;<br>
                            } // But despite that, `conj_$2` only dies
                            here.<br>
                            ```<br>
                            <br>
                            <br>
                            <div>On 8/6/20 11:35 AM, Gábor Horváth via
                              cfe-dev wrote:<br>
                            </div>
                            <blockquote type="cite">
                              <div dir="ltr">
                                <div>+Artem</div>
                                <div><br>
                                </div>
                                <div>It would be great if the analyzer
                                  could reason about code like that. I
                                  think Artem is the most competent in
                                  these liveness related problems.</div>
                                <div>Aside from performance, I do not
                                  see any downside for keeping the whole
                                  symbolic expression alive after
                                  markInUse was called on it (hopefully
                                  Artem corrects me if I'm wrong).</div>
                                <div>But mainly due to constraint solver
                                  limitations it might not make sense to
                                  keep arbitrarily complex expressions
                                  alive.</div>
                                <div><br>
                                  <a class="gmail_plusreply" id="gmail-m_3076766397599641215m_-4104045513335271565gmail-m_6490615832270949146gmail-m_-2871203687350369529plusReplyChip-1" href="mailto:benicsbalazs@gmail.com" rel="noreferrer" target="_blank">@Balázs
                                    Benics</a></div>
                                <div>How much work would it be to
                                  prototype keeping these expressions
                                  alive and measuring performance and
                                  memory implications?</div>
                                <div><br>
                                </div>
                              </div>
                              <br>
                              <div class="gmail_quote">
                                <div dir="ltr" class="gmail_attr">On
                                  Wed, 5 Aug 2020 at 16:17, Balázs
                                  Benics via cfe-dev <<a href="mailto:cfe-dev@lists.llvm.org" rel="noreferrer" target="_blank">cfe-dev@lists.llvm.org</a>>
                                  wrote:<br>
                                </div>
                                <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                  <div dir="ltr">If you are really
                                    curious, here is some context.<br>
                                    <div><br>
                                      Imagine the following code (<span style="font-family:monospace">test.c</span>):<br>
                                      ```<br>
                                      <span style="font-family:monospace">typedef
                                        typeof(sizeof(int)) size_t;<br>
                                        <br>
                                        void clang_analyzer_eval(int);<br>
                                        char  *strcat(char *restrict s1,
                                        const char *restrict s2);<br>
                                        size_t strlen(const char *s);<br>
                                        <br>
                                        void
                                        strcat_symbolic_src_length(char
                                        *src) {<br>
                                          char dst[8] = "1234";<br>
                                          strcat(dst, src);<br>
                                         
                                        clang_analyzer_eval(strlen(dst)
                                        >= 4); //
                                        expected-warning{{TRUE}}<br>
                                          (void)*dst;<br>
                                        }</span><br>
                                      ```<br>
                                      <br>
                                      One would expect that the '<span style="font-family:monospace">strlen(dst)
                                        >= 4</span>' is <span style="font-family:monospace">TRUE</span>,
                                      but it returns <span style="font-family:monospace">UNKOWN</span>.<br>
                                      <br>
                                      After doing a little bit of
                                      investigation - and debug prints -
                                      I came up with the following
                                      trace.<br>
                                      <br>
                                      ---<br>
                                      <br>
                                      <span style="font-family:monospace">#
                                        In the
                                        CStringChecker::evalStrcat
                                        creates a metadata symbol
                                        representing the cstring length
                                        of the region pointed by 'src'.<br>
                                        created metadata symbol
                                        'meta_$2{SymRegion{reg_$0<char
                                        * src>},unsigned long}'<br>
                                        <br>
                                        # After the evalStrcat evaluated
                                        the call, the state contains the
                                        necessary mapping that the 'dst'
                                        points to a cstring which is 4 +
                                        meta$4 long.<br>
                                        # Note that meta$4 represents
                                        the cstring length of the region
                                        pointed by 'src'.<br>
                                        # So far so good. We know that
                                        resulting cstring length
                                        precisely.<br>
                                        strcpy common END State.dump:
                                        "program_state": {<br>
                                          [...]<br>
                                          "checker_messages": [<br>
                                            { "checker":
                                        "unix.cstring.CStringModeling",
                                        "messages": [<br>
                                              "CString lengths:", <br>
                                              "dst:
                                        (meta_$2{SymRegion{reg_$0<char
                                        * src>},unsigned long}) +
                                        4U", <br>
                                              "SymRegion{reg_$0<char
                                        * src>}:
                                        meta_$2{SymRegion{reg_$0<char
                                        * src>},unsigned long}", <br>
                                              ""<br>
                                            ]}<br>
                                          ]<br>
                                        }<br>
                                        <br>
                                        # We mark all symbols as live in
                                        the cstring length map.<br>
                                        # At least we think so...<br>
                                        CStringChecker::checkLiveSymbols
                                        marks
                                        '(meta_$2{SymRegion{reg_$0<char
                                        * src>},unsigned long}) + 4U'
                                        in use<br>
                                        CStringChecker::checkLiveSymbols
                                        marks
                                        'meta_$2{SymRegion{reg_$0<char
                                        * src>},unsigned long}' in
                                        use<br>
                                        CStringChecker::checkLiveSymbols
                                        marks
                                        'meta_$2{SymRegion{reg_$0<char
                                        * src>},unsigned long}' in
                                        use<br>
                                        <br>
                                        # Even we marked the given
                                        symbols in use, we still removes
                                        them for some reason...<br>
                                        CStringChecker::checkDeadSymbols
                                        finds the
                                        '(meta_$2{SymRegion{reg_$0<char
                                        * src>},unsigned long}) + 4U'
                                        as dead; so removes the mapping
                                        from 'dst'<br>
                                        CStringChecker::checkDeadSymbols
                                        finds the
                                        'meta_$2{SymRegion{reg_$0<char
                                        * src>},unsigned long}' as
                                        dead; so removes the mapping
                                        from 'SymRegion{reg_$0<char *
                                        src>}'<br>
                                        <br>
                                        # Now that state does not
                                        contain the cstring length of
                                        the region pointed by 'dst'.</span><br>
                                      <br>
                                      ---<br>
                                      <br>
                                      Further investigation showed that
                                      even if I would visit all the sub
                                      SymExprs looking for
                                      SymbolMetadata would not help.<br>
                                      SymbolReaper::isDead would still
                                      show that the
                                      '(meta_$2{SymRegion{reg_$0<char
                                      * src>},unsigned long}) + 4U'
                                      SymExpr is dead.<br>
                                      <br>
                                      ---<br>
                                      <br>
                                      How can we preserve such metadata
                                      information?<br>
                                      <br>
                                      ---<br>
                                      <br>
                                      You can also reproduce this
                                      following these steps:<br>
                                      <br>
                                      Apply the <span style="font-family:monospace">add-debug-prints.patch</span>
                                      on top of <span style="font-family:monospace">7f1556f292ccfd80c4ffa986d5b849f915e5cd82
                                        "Fix typo:
                                        s/epomymous/eponymous/ NFC"</span>.<br>
                                      Analyze the '<span style="font-family:monospace">test.c</span>'
                                      file using this command:<br>
                                      <span style="font-family:monospace">./bin/clang
                                        -cc1 -internal-isystem
                                        lib/clang/12.0.0/include
                                        -nostdsysteminc -analyze
                                        -analyzer-constraints=range
                                        -setup-static-analyzer
-analyzer-checker=unix.cstring,alpha.unix.cstring,alpha.security.taint,debug.ExprInspection
                                        -analyzer-config
                                        eagerly-assume=false test.c</span><br>
                                    </div>
                                  </div>
                                  <br>
                                  <div class="gmail_quote">
                                    <div dir="ltr" class="gmail_attr">Balázs
                                      Benics <<a href="mailto:benicsbalazs@gmail.com" rel="noreferrer" target="_blank">benicsbalazs@gmail.com</a>>
                                      ezt írta (időpont: 2020. aug. 5.,
                                      Sze, 10:29):<br>
                                    </div>
                                    <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                      <div dir="ltr">
                                        <div>Why does the
                                          SymbolReaper::markInUse only
                                          work on SymbolMetadata
                                          symbols?</div>
                                        <div>
                                          <pre style="background-color:rgb(43,43,43);color:rgb(169,183,198);font-family:"JetBrains Mono",monospace;font-size:9.8pt"><span style="color:rgb(204,120,50)">void </span><span style="color:rgb(181,182,227)">SymbolReaper</span>::<span style="color:rgb(255,198,109)">markInUse</span>(<span style="color:rgb(185,188,209)">SymbolRef </span>sym) {
  <span style="color:rgb(204,120,50)">if </span>(isa<<span style="color:rgb(181,182,227)">SymbolMetadata</span>>(sym))
    <span style="color:rgb(147,115,165)">MetadataInUse</span>.insert(sym)<span style="color:rgb(204,120,50)">;
</span>}</pre>
                                        </div>
                                        <div>I think it is flawed if the
                                          Symbol is a SymIntExpr holding
                                          an expression tree referring
                                          to SymbolMetadata symbols. In
                                          such case, those symbols would
                                          not be kept alive - causing
                                          some confusion on the checker
                                          developers' side and
                                          potentially losing some
                                          information about the
                                          analysis.</div>
                                        <div><br>
                                        </div>
                                        <div>Should we walk the
                                          expression tree instead of the
                                          current implementation?</div>
                                        <div>What performance impact
                                          should we expect by doing so?</div>
                                        <div><br>
                                        </div>
                                        <div>Any ideas?</div>
                                        <div><br>
                                        </div>
                                        <div>Balazs.<br>
                                        </div>
                                      </div>
                                    </blockquote>
                                  </div>
_______________________________________________<br>
                                  cfe-dev mailing list<br>
                                  <a href="mailto:cfe-dev@lists.llvm.org" rel="noreferrer" target="_blank">cfe-dev@lists.llvm.org</a><br>
                                  <a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" rel="noreferrer noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a><br>
                                </blockquote>
                              </div>
                              <br>
                              <fieldset></fieldset>
                              <pre>_______________________________________________
cfe-dev mailing list
<a href="mailto:cfe-dev@lists.llvm.org" rel="noreferrer" target="_blank">cfe-dev@lists.llvm.org</a>
<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a>
</pre>
                            </blockquote>
                            <br>
                          </div>
                        </blockquote>
                      </div>
                    </blockquote>
                    <br>
                  </div>
                </blockquote>
              </div>
            </blockquote>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
  </div>

</blockquote></div>