<div dir="auto">Thanks for the advice.<div dir="auto">It sounds like a plan, Artem. </div><div dir="auto">I will make some experiments :)</div><div dir="auto"><br></div><div dir="auto">We will see where it goes. </div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Aug 20, 2020, 00:02 Artem Dergachev <<a href="mailto:noqnoqneo@gmail.com">noqnoqneo@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>
<br>
<br>
<div>On 8/19/20 2:18 PM, Artem Dergachev
wrote:<br>
</div>
<blockquote type="cite">
Gabor, we've talked about this two days ago >_> +Nithin
because we're facing the same decision with the smart pointer
checker to which raw pointer value would make a lot of sense as
SymbolMetadata.<br>
<br>
The way SymbolMetadata used in CStringChecker always bothered me.
It *is* used *as if* it's SymbolConjured: it represents an unknown
value of an expression of a specific type. As such it doesn't do
much more than preserve its identity for debugging purposes. We
indeed don't need another SymbolConjured.<br>
<br>
I believe we should remove SymbolConjured's identity elements from
SymbolMetadata and instead make it work more like
SymbolRegionValue works in RegionStore. Namely:<br>
<br>
1. Remove statement, location context, block count from
SymbolMetadata's identity. Let it solely depend on the region.<br>
<br>
2. Get rid of the metadata-in-use set. From now on SymbolMetadata,
like SymbolRegionValue, is live whenever its region is live.<br>
<br>
3. When strlen(R) is used for the first time on a region R,
produce $meta<size_t R> as the answer, but *do not store* it
in the string length map. We don't need to change the state
because the state didn't in fact change. On subsequent strlen(R)
calls we'll simply return the same symbol (*because* the map will
remain empty), until the string is changed.<br>
<br>
4. If the string is mutated, record its length as usual. But if
the string is invalidated (or the new length is completely
unknown), *do not remove* the length from the state; instead, set
it to SymbolConjured. Only remove it from the map when the region
dies.<br>
<br>
5. Keep string length symbols alive as long as they're in the map.<br>
<br>
This model is obviously correct because it mimics RegionStore
which is obviously correct (i mean, it's obviously broken beyond
repair, but *this* part seems to be correct). In particular, this
model doesn't have any obvious flaws that you point out in this
thread. It also makes sure that the identity of the state
correctly reflects mutations in the state. I think it's very much
superior to the existing model. This also more or less matches my
belief of "everything is a Store" that i wanted to push on a few
years ago (though that one was more radical; back then i wanted to
make a "shadow" region for a string and use the actual
SymbolRegionValue of that shadow region instead of the
SymbolMetadata-as-suggested-in-this-mail).<br>
<br>
Balasz, are you willing to implement something like that?<br>
</blockquote>
<br>
*Balázs.<br>
Sry!!<br>
<br>
<blockquote type="cite"> <br>
<br>
<div>On 8/19/20 7:59 AM, Gábor Horváth
wrote:<br>
</div>
<blockquote type="cite">
<div dir="auto">
<div>Hi!
<div dir="auto"><br>
</div>
<div dir="auto">Since the metadata symbol describes a
property of the region I think in this example it might
make sense to duplicate the metadata symbol for dst. But I
can also imagine code using the length of a dead string
which would be nice to support. </div>
<div dir="auto"><br>
</div>
<div dir="auto">I also wonder what is the advantage of not
using a conjured symbol. </div>
<div dir="auto"><br>
</div>
<div dir="auto">Regards,</div>
<div dir="auto">Gabor</div>
<br>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Wed, Aug 19, 2020,
3:08 PM Balázs Benics <<a href="mailto:benicsbalazs@gmail.com" target="_blank" rel="noreferrer">benicsbalazs@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">
<div>It turns out that if the region of `src` is
alive, then the required metadata symbol will be
kept alive as well.</div>
<div>Here is the modified example:</div>
<div><span style="font-family:monospace">void
strcat_symbolic_src_length(char *src) {<br>
char dst[8] = "1234";<br>
strcat(dst, src);</span></div>
<div><span style="font-family:monospace">
clang_analyzer_eval(strlen(dst) >= 4); //
expected-warning{{TRUE}}<br>
(void)*dst;<br>
</span></div>
<div><span style="font-family:monospace"> <b>(void)*src;</b>
// Now we keep the `src` alive, thus any metadata
symbols to that region will be alive as well at
the eval call.<br>
</span></div>
<div><span style="font-family:monospace">}</span><br>
```</div>
<div><br>
</div>
<div>It seems slightly confusing to me that depending
on that `src` is used later or not, the
`clang_analyzer_eval(strlen(dst) >= 4)` will show
either `TRUE` or `UNKNOWN`.</div>
<div>I think it should always give `TRUE` as an
answer.<br>
<br>
</div>
<div>Note that a metadata symbol is alive only if
marked in use AND its region is also alive.</div>
<div>Without the `(void)*src;` the region of `src` is
dead, thus the symbol ($meta{src} + 4) representing
the cstring length of the region `dst` will be dead
too.</div>
<div><br>
</div>
<div>Since this problem was caused by the handling of
metadata symbols, shouldn't we use conjured ones
instead?</div>
<div>In that way, we would decouple the liveness of
the cstring length of a region and the region
itself.</div>
<div><br>
</div>
<div>What is the use-case for using a metadata symbol
instead of a conjured one?<br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">Artem Dergachev <<a href="mailto:noqnoqneo@gmail.com" rel="noreferrer noreferrer" target="_blank">noqnoqneo@gmail.com</a>>
ezt írta (időpont: 2020. aug. 7., P, 0:05):<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div> <br>
<br>
<div>On 8/6/20 1:38 PM, Balázs Benics wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>Gábor<br>
</div>
<div>> How much work would it be to
prototype keeping these expressions alive
and measuring performance and memory
implications?</div>
<div>I'm not sure, since I'm not really
experienced in this liveness stuff. I will
try it.</div>
<div><br>
</div>
<div>Artem<br>
</div>
<div>> It sounds to me as if putting
metadata symbols into the live set directly
would have worked just fine.</div>
<div>Ehm, I' not sure about this.</div>
<div>If you look my example closely, you can
note that the <span style="font-family:monospace">CStringChecker</span>
maps directly the <span style="font-family:monospace">SymRegion{reg_$0<char
* src>}</span> to the <span style="font-family:monospace">meta_$2{SymRegion{reg_$0<char
* src>},unsigned long}</span> symbol.<br>
</div>
<div>So the <span style="font-family:monospace">SymbolReaper::markInUse</span>
will in fact place that <span style="font-family:monospace">meta_$2</span>
symbol in the Live set.</div>
<div>However later, when you query the <span style="font-family:monospace">SymbolReaper</span>
if the mentioned <span style="font-family:monospace">meta_$2</span>
symbol is dead or not, it will answer you
that it is <b>dead</b>.</div>
</div>
</blockquote>
<br>
That's not how that works.<br>
<br>
markInUse() doesn't put anything into the live
set, it puts things into an auxiliary
"metadata-in-use" set: <a href="https://github.com/llvm/llvm-project/blob/llvmorg-11.0.0-rc1/clang/lib/StaticAnalyzer/Core/SymbolManager.cpp#L396" rel="noreferrer noreferrer" target="_blank">https://github.com/llvm/llvm-project/blob/llvmorg-11.0.0-rc1/clang/lib/StaticAnalyzer/Core/SymbolManager.cpp#L396</a><br>
<br>
On the other hand, isDead()/isLive() has to return
true if the symbol is present in the live set. In
fact, that's the first thing it checks for: <a href="https://github.com/llvm/llvm-project/blob/llvmorg-11.0.0-rc1/clang/lib/StaticAnalyzer/Core/SymbolManager.cpp#L437" rel="noreferrer noreferrer" target="_blank">https://github.com/llvm/llvm-project/blob/llvmorg-11.0.0-rc1/clang/lib/StaticAnalyzer/Core/SymbolManager.cpp#L437</a><br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div><br>
</div>
<div>I'm quoting the related trace log: <br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div><span style="font-family:monospace">[...]<br>
<b>CStringChecker::</b><b>checkLiveSymbols
marks
'meta_$2{SymRegion{reg_$0<char *
src>},unsigned long}' in use</b><br>
<br>
# Even we marked the given symbols in
use, we still removes them for some
reason...<br>
CStringChecker::checkDeadSymbols finds
the '(meta_$2{SymRegion{reg_$0<char *
src>},unsigned long}) + 4U' as dead;
so removes the mapping from 'dst'<br>
<b>CStringChecker::</b><b>checkDeadSymbols
finds the
'meta_$2{SymRegion{reg_$0<char *
src>},unsigned long}' as dead; so
removes the mapping from
'SymRegion{reg_$0<char * src>}'</b></span></div>
</blockquote>
<div><br>
</div>
<div>Why does the <span style="font-family:monospace">ExprEngine</span>
conjure a return symbol, if an <span style="font-family:monospace">evalCall</span>
already evaluated the call?</div>
<div><br>
</div>
<div>Artem</div>
<div>> See how RegionStore does that within
`ScanReachableSymbols::scan(const SymExpr
*)`</div>
<div>I managed to implement a metadata
collector visitor using the new
SymExprVisitor infrastructure, in just a
couple of lines of code. I was amazed by
that :)<br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">Artem
Dergachev <<a href="mailto:noqnoqneo@gmail.com" rel="noreferrer noreferrer" target="_blank">noqnoqneo@gmail.com</a>>
ezt írta (időpont: 2020. aug. 6., Cs,
19:04):<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div> I- i- i was about to reply to that!<br>
<br>
I don't know why metadata-in-use is a
thing at all. It sounds to me as if
putting metadata symbols into the live set
directly would have worked just fine. If
you find any interesting counterexamples
please let me know.<br>
<br>
Apart from that, indeed, the correct way
to implement checkLiveSymbols when you're
tracking arbitrary symbols is to iterate
over these arbitrary symbols and mark all
sub-symbols as live. See how RegionStore
does that within
`ScanReachableSymbols::scan(const SymExpr
*)`. I.e., the following example works
correctly and i expect CStringChecker to
work similarly:<br>
<br>
```<br>
int conjure();<br>
<br>
int foo() {<br>
int x = conjure();<br>
clang_analyzer_warnOnDeadSymbol(x);<br>
return x + 1;<br>
}<br>
<br>
void bar() {<br>
int y = foo(); // At this point
`conj_$2` is no longer directly present in
the state; only `conj_$2 + 1` is.<br>
(void)y;<br>
} // But despite that, `conj_$2` only dies
here.<br>
```<br>
<br>
<br>
<div>On 8/6/20 11:35 AM, Gábor Horváth via
cfe-dev wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>+Artem</div>
<div><br>
</div>
<div>It would be great if the analyzer
could reason about code like that. I
think Artem is the most competent in
these liveness related problems.</div>
<div>Aside from performance, I do not
see any downside for keeping the
whole symbolic expression alive
after markInUse was called on it
(hopefully Artem corrects me if I'm
wrong).</div>
<div>But mainly due to constraint
solver limitations it might not make
sense to keep arbitrarily complex
expressions alive.</div>
<div><br>
<a class="gmail_plusreply" id="m_-5368288223173890362m_-4104045513335271565gmail-m_6490615832270949146gmail-m_-2871203687350369529plusReplyChip-1" href="mailto:benicsbalazs@gmail.com" rel="noreferrer noreferrer" target="_blank">@Balázs
Benics</a></div>
<div>How much work would it be to
prototype keeping these expressions
alive and measuring performance and
memory implications?</div>
<div><br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On
Wed, 5 Aug 2020 at 16:17, Balázs
Benics via cfe-dev <<a href="mailto:cfe-dev@lists.llvm.org" rel="noreferrer noreferrer" target="_blank">cfe-dev@lists.llvm.org</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">If you are really
curious, here is some context.<br>
<div><br>
Imagine the following code (<span style="font-family:monospace">test.c</span>):<br>
```<br>
<span style="font-family:monospace">typedef
typeof(sizeof(int)) size_t;<br>
<br>
void clang_analyzer_eval(int);<br>
char *strcat(char *restrict
s1, const char *restrict s2);<br>
size_t strlen(const char *s);<br>
<br>
void
strcat_symbolic_src_length(char
*src) {<br>
char dst[8] = "1234";<br>
strcat(dst, src);<br>
clang_analyzer_eval(strlen(dst)
>= 4); //
expected-warning{{TRUE}}<br>
(void)*dst;<br>
}</span><br>
```<br>
<br>
One would expect that the '<span style="font-family:monospace">strlen(dst)
>= 4</span>' is <span style="font-family:monospace">TRUE</span>,
but it returns <span style="font-family:monospace">UNKOWN</span>.<br>
<br>
After doing a little bit of
investigation - and debug prints
- I came up with the following
trace.<br>
<br>
---<br>
<br>
<span style="font-family:monospace">#
In the
CStringChecker::evalStrcat
creates a metadata symbol
representing the cstring
length of the region pointed
by 'src'.<br>
created metadata symbol
'meta_$2{SymRegion{reg_$0<char
* src>},unsigned long}'<br>
<br>
# After the evalStrcat
evaluated the call, the state
contains the necessary mapping
that the 'dst' points to a
cstring which is 4 + meta$4
long.<br>
# Note that meta$4 represents
the cstring length of the
region pointed by 'src'.<br>
# So far so good. We know that
resulting cstring length
precisely.<br>
strcpy common END State.dump:
"program_state": {<br>
[...]<br>
"checker_messages": [<br>
{ "checker":
"unix.cstring.CStringModeling",
"messages": [<br>
"CString lengths:", <br>
"dst:
(meta_$2{SymRegion{reg_$0<char
* src>},unsigned long}) +
4U", <br>
"SymRegion{reg_$0<char *
src>}:
meta_$2{SymRegion{reg_$0<char
* src>},unsigned long}", <br>
""<br>
]}<br>
]<br>
}<br>
<br>
# We mark all symbols as live
in the cstring length map.<br>
# At least we think so...<br>
CStringChecker::checkLiveSymbols marks
'(meta_$2{SymRegion{reg_$0<char
* src>},unsigned long}) +
4U' in use<br>
CStringChecker::checkLiveSymbols marks 'meta_$2{SymRegion{reg_$0<char
* src>},unsigned long}' in
use<br>
CStringChecker::checkLiveSymbols marks 'meta_$2{SymRegion{reg_$0<char
* src>},unsigned long}' in
use<br>
<br>
# Even we marked the given
symbols in use, we still
removes them for some
reason...<br>
CStringChecker::checkDeadSymbols finds the
'(meta_$2{SymRegion{reg_$0<char
* src>},unsigned long}) +
4U' as dead; so removes the
mapping from 'dst'<br>
CStringChecker::checkDeadSymbols finds the
'meta_$2{SymRegion{reg_$0<char
* src>},unsigned long}' as
dead; so removes the mapping
from 'SymRegion{reg_$0<char
* src>}'<br>
<br>
# Now that state does not
contain the cstring length of
the region pointed by 'dst'.</span><br>
<br>
---<br>
<br>
Further investigation showed
that even if I would visit all
the sub SymExprs looking for
SymbolMetadata would not help.<br>
SymbolReaper::isDead would still
show that the
'(meta_$2{SymRegion{reg_$0<char
* src>},unsigned long}) + 4U'
SymExpr is dead.<br>
<br>
---<br>
<br>
How can we preserve such
metadata information?<br>
<br>
---<br>
<br>
You can also reproduce this
following these steps:<br>
<br>
Apply the <span style="font-family:monospace">add-debug-prints.patch</span>
on top of <span style="font-family:monospace">7f1556f292ccfd80c4ffa986d5b849f915e5cd82
"Fix typo:
s/epomymous/eponymous/ NFC"</span>.<br>
Analyze the '<span style="font-family:monospace">test.c</span>'
file using this command:<br>
<span style="font-family:monospace">./bin/clang
-cc1 -internal-isystem
lib/clang/12.0.0/include
-nostdsysteminc -analyze
-analyzer-constraints=range
-setup-static-analyzer
-analyzer-checker=unix.cstring,alpha.unix.cstring,alpha.security.taint,debug.ExprInspection
-analyzer-config
eagerly-assume=false test.c</span><br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">Balázs
Benics <<a href="mailto:benicsbalazs@gmail.com" rel="noreferrer noreferrer" target="_blank">benicsbalazs@gmail.com</a>>
ezt írta (időpont: 2020. aug.
5., Sze, 10:29):<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div>Why does the
SymbolReaper::markInUse only
work on SymbolMetadata
symbols?</div>
<div>
<pre style="background-color:rgb(43,43,43);color:rgb(169,183,198);font-family:"JetBrains Mono",monospace;font-size:9.8pt"><span style="color:rgb(204,120,50)">void </span><span style="color:rgb(181,182,227)">SymbolReaper</span>::<span style="color:rgb(255,198,109)">markInUse</span>(<span style="color:rgb(185,188,209)">SymbolRef </span>sym) {
<span style="color:rgb(204,120,50)">if </span>(isa<<span style="color:rgb(181,182,227)">SymbolMetadata</span>>(sym))
<span style="color:rgb(147,115,165)">MetadataInUse</span>.insert(sym)<span style="color:rgb(204,120,50)">;
</span>}</pre>
</div>
<div>I think it is flawed if
the Symbol is a SymIntExpr
holding an expression tree
referring to SymbolMetadata
symbols. In such case, those
symbols would not be kept
alive - causing some
confusion on the checker
developers' side and
potentially losing some
information about the
analysis.</div>
<div><br>
</div>
<div>Should we walk the
expression tree instead of
the current implementation?</div>
<div>What performance impact
should we expect by doing
so?</div>
<div><br>
</div>
<div>Any ideas?</div>
<div><br>
</div>
<div>Balazs.<br>
</div>
</div>
</blockquote>
</div>
_______________________________________________<br>
cfe-dev mailing list<br>
<a href="mailto:cfe-dev@lists.llvm.org" rel="noreferrer noreferrer" target="_blank">cfe-dev@lists.llvm.org</a><br>
<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" rel="noreferrer noreferrer noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a><br>
</blockquote>
</div>
<br>
<fieldset></fieldset>
<pre>_______________________________________________
cfe-dev mailing list
<a href="mailto:cfe-dev@lists.llvm.org" rel="noreferrer noreferrer" target="_blank">cfe-dev@lists.llvm.org</a>
<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" rel="noreferrer noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a>
</pre>
</blockquote>
<br>
</div>
</blockquote>
</div>
</blockquote>
<br>
</div>
</blockquote>
</div>
</blockquote>
</div>
</div>
</div>
</blockquote>
<br>
</blockquote>
<br>
</div>
</blockquote></div>