[cfe-dev] [analyzer] Calling SymbolReaper::markInUse on BinarySymExprs are ignored

Balázs Benics via cfe-dev cfe-dev at lists.llvm.org
Wed Aug 19 15:19:12 PDT 2020


Thanks for the advice.
It sounds like a plan, Artem.
I will make some experiments :)

We will see where it goes.

On Thu, Aug 20, 2020, 00:02 Artem Dergachev <noqnoqneo at gmail.com> wrote:

>
>
> On 8/19/20 2:18 PM, Artem Dergachev wrote:
>
> Gabor, we've talked about this two days ago >_> +Nithin because we're
> facing the same decision with the smart pointer checker to which raw
> pointer value would make a lot of sense as SymbolMetadata.
>
> The way SymbolMetadata used in CStringChecker always bothered me. It *is*
> used *as if* it's SymbolConjured: it represents an unknown value of an
> expression of a specific type. As such it doesn't do much more than
> preserve its identity for debugging purposes. We indeed don't need another
> SymbolConjured.
>
> I believe we should remove SymbolConjured's identity elements from
> SymbolMetadata and instead make it work more like SymbolRegionValue works
> in RegionStore. Namely:
>
> 1. Remove statement, location context, block count from SymbolMetadata's
> identity. Let it solely depend on the region.
>
> 2. Get rid of the metadata-in-use set. From now on SymbolMetadata, like
> SymbolRegionValue, is live whenever its region is live.
>
> 3. When strlen(R) is used for the first time on a region R, produce
> $meta<size_t R> as the answer, but *do not store* it in the string length
> map. We don't need to change the state because the state didn't in fact
> change. On subsequent strlen(R) calls we'll simply return the same symbol
> (*because* the map will remain empty), until the string is changed.
>
> 4. If the string is mutated, record its length as usual. But if the string
> is invalidated (or the new length is completely unknown), *do not remove*
> the length from the state; instead, set it to SymbolConjured. Only remove
> it from the map when the region dies.
>
> 5. Keep string length symbols alive as long as they're in the map.
>
> This model is obviously correct because it mimics RegionStore which is
> obviously correct (i mean, it's obviously broken beyond repair, but *this*
> part seems to be correct). In particular, this model doesn't have any
> obvious flaws that you point out in this thread. It also makes sure that
> the identity of the state correctly reflects mutations in the state. I
> think it's very much superior to the existing model. This also more or less
> matches my belief of "everything is a Store" that i wanted to push on a few
> years ago (though that one was more radical; back then i wanted to make a
> "shadow" region for a string and use the actual SymbolRegionValue of that
> shadow region instead of the SymbolMetadata-as-suggested-in-this-mail).
>
> Balasz, are you willing to implement something like that?
>
>
> *Balázs.
> Sry!!
>
>
>
> On 8/19/20 7:59 AM, Gábor Horváth wrote:
>
> Hi!
>
> Since the metadata symbol describes a property of the region I think in
> this example it might make sense to duplicate the metadata symbol for dst.
> But I can also imagine code using the length of a dead string which would
> be nice to support.
>
> I also wonder what is the advantage of not using a conjured symbol.
>
> Regards,
> Gabor
>
>
> On Wed, Aug 19, 2020, 3:08 PM Balázs Benics <benicsbalazs at gmail.com>
> wrote:
>
>> It turns out that if the region of `src` is alive, then the required
>> metadata symbol will be kept alive as well.
>> Here is the modified example:
>> void strcat_symbolic_src_length(char *src) {
>>   char dst[8] = "1234";
>>   strcat(dst, src);
>>   clang_analyzer_eval(strlen(dst) >= 4); // expected-warning{{TRUE}}
>>   (void)*dst;
>>   *(void)*src;* // Now we keep the `src` alive, thus any metadata
>> symbols to that region will be alive as well at the eval call.
>> }
>> ```
>>
>> It seems slightly confusing to me that depending on that `src` is used
>> later or not, the `clang_analyzer_eval(strlen(dst) >= 4)` will show either
>> `TRUE` or `UNKNOWN`.
>> I think it should always give `TRUE` as an answer.
>>
>> Note that a metadata symbol is alive only if marked in use AND its region
>> is also alive.
>> Without the `(void)*src;` the region of `src` is dead, thus the symbol
>> ($meta{src} + 4) representing the cstring length of the region `dst` will
>> be dead too.
>>
>> Since this problem was caused by the handling of metadata symbols,
>> shouldn't we use conjured ones instead?
>> In that way, we would decouple the liveness of the cstring length of a
>> region and the region itself.
>>
>> What is the use-case for using a metadata symbol instead of a conjured
>> one?
>>
>> Artem Dergachev <noqnoqneo at gmail.com> ezt írta (időpont: 2020. aug. 7.,
>> P, 0:05):
>>
>>>
>>>
>>> On 8/6/20 1:38 PM, Balázs Benics wrote:
>>>
>>> Gábor
>>> > How much work would it be to prototype keeping these expressions alive
>>> and measuring performance and memory implications?
>>> I'm not sure, since I'm not really experienced in this liveness stuff. I
>>> will try it.
>>>
>>> Artem
>>> > It sounds to me as if putting metadata symbols into the live set
>>> directly would have worked just fine.
>>> Ehm, I' not sure about this.
>>> If you look my example closely, you can note that the CStringChecker
>>> maps directly the SymRegion{reg_$0<char * src>} to the meta_$2{SymRegion{reg_$0<char
>>> * src>},unsigned long} symbol.
>>> So the SymbolReaper::markInUse will in fact place that meta_$2 symbol
>>> in the Live set.
>>> However later, when you query the SymbolReaper if the mentioned meta_$2
>>> symbol is dead or not, it will answer you that it is *dead*.
>>>
>>>
>>> That's not how that works.
>>>
>>> markInUse() doesn't put anything into the live set, it puts things into
>>> an auxiliary "metadata-in-use" set:
>>> https://github.com/llvm/llvm-project/blob/llvmorg-11.0.0-rc1/clang/lib/StaticAnalyzer/Core/SymbolManager.cpp#L396
>>>
>>> On the other hand, isDead()/isLive() has to return true if the symbol is
>>> present in the live set. In fact, that's the first thing it checks for:
>>> https://github.com/llvm/llvm-project/blob/llvmorg-11.0.0-rc1/clang/lib/StaticAnalyzer/Core/SymbolManager.cpp#L437
>>>
>>>
>>> I'm quoting the related trace log:
>>>
>>>> [...]
>>>> *CStringChecker::**checkLiveSymbols marks
>>>> 'meta_$2{SymRegion{reg_$0<char * src>},unsigned long}' in use*
>>>>
>>>> # Even we marked the given symbols in use, we still removes them for
>>>> some reason...
>>>> CStringChecker::checkDeadSymbols finds the
>>>> '(meta_$2{SymRegion{reg_$0<char * src>},unsigned long}) + 4U' as dead; so
>>>> removes the mapping from 'dst'
>>>> *CStringChecker::**checkDeadSymbols finds the
>>>> 'meta_$2{SymRegion{reg_$0<char * src>},unsigned long}' as dead; so removes
>>>> the mapping from 'SymRegion{reg_$0<char * src>}'*
>>>>
>>>
>>> Why does the ExprEngine conjure a return symbol, if an evalCall already
>>> evaluated the call?
>>>
>>> Artem
>>> > See how RegionStore does that within `ScanReachableSymbols::scan(const
>>> SymExpr *)`
>>> I managed to implement a metadata collector visitor using the new
>>> SymExprVisitor infrastructure, in just a couple of lines of code. I was
>>> amazed by that :)
>>>
>>> Artem Dergachev <noqnoqneo at gmail.com> ezt írta (időpont: 2020. aug. 6.,
>>> Cs, 19:04):
>>>
>>>> I- i- i was about to reply to that!
>>>>
>>>> I don't know why metadata-in-use is a thing at all. It sounds to me as
>>>> if putting metadata symbols into the live set directly would have worked
>>>> just fine. If you find any interesting counterexamples please let me know.
>>>>
>>>> Apart from that, indeed, the correct way to implement checkLiveSymbols
>>>> when you're tracking arbitrary symbols is to iterate over these arbitrary
>>>> symbols and mark all sub-symbols as live. See how RegionStore does that
>>>> within `ScanReachableSymbols::scan(const SymExpr *)`. I.e., the following
>>>> example works correctly and i expect CStringChecker to work similarly:
>>>>
>>>> ```
>>>> int conjure();
>>>>
>>>> int foo() {
>>>>   int x = conjure();
>>>>   clang_analyzer_warnOnDeadSymbol(x);
>>>>   return x + 1;
>>>> }
>>>>
>>>> void bar() {
>>>>   int y = foo(); // At this point `conj_$2` is no longer directly
>>>> present in the state; only `conj_$2 + 1` is.
>>>>   (void)y;
>>>> } // But despite that, `conj_$2` only dies here.
>>>> ```
>>>>
>>>>
>>>> On 8/6/20 11:35 AM, Gábor Horváth via cfe-dev wrote:
>>>>
>>>> +Artem
>>>>
>>>> It would be great if the analyzer could reason about code like that. I
>>>> think Artem is the most competent in these liveness related problems.
>>>> Aside from performance, I do not see any downside for keeping the whole
>>>> symbolic expression alive after markInUse was called on it (hopefully Artem
>>>> corrects me if I'm wrong).
>>>> But mainly due to constraint solver limitations it might not make sense
>>>> to keep arbitrarily complex expressions alive.
>>>>
>>>> @Balázs Benics <benicsbalazs at gmail.com>
>>>> How much work would it be to prototype keeping these expressions alive
>>>> and measuring performance and memory implications?
>>>>
>>>>
>>>> On Wed, 5 Aug 2020 at 16:17, Balázs Benics via cfe-dev <
>>>> cfe-dev at lists.llvm.org> wrote:
>>>>
>>>>> If you are really curious, here is some context.
>>>>>
>>>>> Imagine the following code (test.c):
>>>>> ```
>>>>> typedef typeof(sizeof(int)) size_t;
>>>>>
>>>>> void clang_analyzer_eval(int);
>>>>> char  *strcat(char *restrict s1, const char *restrict s2);
>>>>> size_t strlen(const char *s);
>>>>>
>>>>> void strcat_symbolic_src_length(char *src) {
>>>>>   char dst[8] = "1234";
>>>>>   strcat(dst, src);
>>>>>   clang_analyzer_eval(strlen(dst) >= 4); // expected-warning{{TRUE}}
>>>>>   (void)*dst;
>>>>> }
>>>>> ```
>>>>>
>>>>> One would expect that the 'strlen(dst) >= 4' is TRUE, but it returns
>>>>> UNKOWN.
>>>>>
>>>>> After doing a little bit of investigation - and debug prints - I came
>>>>> up with the following trace.
>>>>>
>>>>> ---
>>>>>
>>>>> # In the CStringChecker::evalStrcat creates a metadata symbol
>>>>> representing the cstring length of the region pointed by 'src'.
>>>>> created metadata symbol 'meta_$2{SymRegion{reg_$0<char *
>>>>> src>},unsigned long}'
>>>>>
>>>>> # After the evalStrcat evaluated the call, the state contains the
>>>>> necessary mapping that the 'dst' points to a cstring which is 4 + meta$4
>>>>> long.
>>>>> # Note that meta$4 represents the cstring length of the region pointed
>>>>> by 'src'.
>>>>> # So far so good. We know that resulting cstring length precisely.
>>>>> strcpy common END State.dump: "program_state": {
>>>>>   [...]
>>>>>   "checker_messages": [
>>>>>     { "checker": "unix.cstring.CStringModeling", "messages": [
>>>>>       "CString lengths:",
>>>>>       "dst: (meta_$2{SymRegion{reg_$0<char * src>},unsigned long}) +
>>>>> 4U",
>>>>>       "SymRegion{reg_$0<char * src>}: meta_$2{SymRegion{reg_$0<char *
>>>>> src>},unsigned long}",
>>>>>       ""
>>>>>     ]}
>>>>>   ]
>>>>> }
>>>>>
>>>>> # We mark all symbols as live in the cstring length map.
>>>>> # At least we think so...
>>>>> CStringChecker::checkLiveSymbols marks '(meta_$2{SymRegion{reg_$0<char
>>>>> * src>},unsigned long}) + 4U' in use
>>>>> CStringChecker::checkLiveSymbols marks 'meta_$2{SymRegion{reg_$0<char
>>>>> * src>},unsigned long}' in use
>>>>> CStringChecker::checkLiveSymbols marks 'meta_$2{SymRegion{reg_$0<char
>>>>> * src>},unsigned long}' in use
>>>>>
>>>>> # Even we marked the given symbols in use, we still removes them for
>>>>> some reason...
>>>>> CStringChecker::checkDeadSymbols finds the
>>>>> '(meta_$2{SymRegion{reg_$0<char * src>},unsigned long}) + 4U' as dead; so
>>>>> removes the mapping from 'dst'
>>>>> CStringChecker::checkDeadSymbols finds the
>>>>> 'meta_$2{SymRegion{reg_$0<char * src>},unsigned long}' as dead; so removes
>>>>> the mapping from 'SymRegion{reg_$0<char * src>}'
>>>>>
>>>>> # Now that state does not contain the cstring length of the region
>>>>> pointed by 'dst'.
>>>>>
>>>>> ---
>>>>>
>>>>> Further investigation showed that even if I would visit all the sub
>>>>> SymExprs looking for SymbolMetadata would not help.
>>>>> SymbolReaper::isDead would still show that the
>>>>> '(meta_$2{SymRegion{reg_$0<char * src>},unsigned long}) + 4U' SymExpr is
>>>>> dead.
>>>>>
>>>>> ---
>>>>>
>>>>> How can we preserve such metadata information?
>>>>>
>>>>> ---
>>>>>
>>>>> You can also reproduce this following these steps:
>>>>>
>>>>> Apply the add-debug-prints.patch on top of 7f1556f292ccfd80c4ffa986d5b849f915e5cd82
>>>>> "Fix typo: s/epomymous/eponymous/ NFC".
>>>>> Analyze the 'test.c' file using this command:
>>>>> ./bin/clang -cc1 -internal-isystem lib/clang/12.0.0/include
>>>>> -nostdsysteminc -analyze -analyzer-constraints=range -setup-static-analyzer
>>>>> -analyzer-checker=unix.cstring,alpha.unix.cstring,alpha.security.taint,debug.ExprInspection
>>>>> -analyzer-config eagerly-assume=false test.c
>>>>>
>>>>> Balázs Benics <benicsbalazs at gmail.com> ezt írta (időpont: 2020. aug.
>>>>> 5., Sze, 10:29):
>>>>>
>>>>>> Why does the SymbolReaper::markInUse only work on SymbolMetadata
>>>>>> symbols?
>>>>>>
>>>>>> void SymbolReaper::markInUse(SymbolRef sym) {
>>>>>>   if (isa<SymbolMetadata>(sym))
>>>>>>     MetadataInUse.insert(sym);}
>>>>>>
>>>>>> I think it is flawed if the Symbol is a SymIntExpr holding an
>>>>>> expression tree referring to SymbolMetadata symbols. In such case, those
>>>>>> symbols would not be kept alive - causing some confusion on the checker
>>>>>> developers' side and potentially losing some information about the analysis.
>>>>>>
>>>>>> Should we walk the expression tree instead of the current
>>>>>> implementation?
>>>>>> What performance impact should we expect by doing so?
>>>>>>
>>>>>> Any ideas?
>>>>>>
>>>>>> Balazs.
>>>>>>
>>>>> _______________________________________________
>>>>> cfe-dev mailing list
>>>>> cfe-dev at lists.llvm.org
>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>
>>>>
>>>> _______________________________________________
>>>> cfe-dev mailing listcfe-dev at lists.llvm.orghttps://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>
>>>>
>>>>
>>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200820/0dcd6fa2/attachment-0001.html>


More information about the cfe-dev mailing list