[cfe-dev] Memory modeling, RegionStore Questions
Jakob Kalchik via cfe-dev
cfe-dev at lists.llvm.org
Mon May 18 04:38:55 PDT 2020
Hello, I’ve been studying the RegionStore implementation of the static
analyzer in an attempt to debug some false positives I’ve come across. I
found the paper “Memory Model for the Static Analysis of C Programs” and
Artem’s workbook (“Clang Static Analyzer – A Checker Developer’s Guide”) ,
and read through many past articles. These are excellent references to get
started, now I'd like to enhance my comprehension by jumping in and fixing
a few issues that may need to be fixed. I apologize in advance if I’m
missing some obvious bridges to comprehension, but I’m still early in the
process of learning this. I’d like to contribute to improving the static
analyzer and learn best through debugging and implementing. Which brings me
to the discussion questions and asks for a few tips on next steps. Thanks
in advance for tips and suggestions for how to contribute.
Question 1:
Constructing a case to understanding how data and pointers to data are
mapped, I came across some unexpected behavior. In the example below, I’d
expect the UNKNOWN values to evaluate to true. Is this a reasonable
expectation for the clang static analyzer?
Digging into this problem a bit more with the debugger, it seems at least a
small portion of RegionStore is not fully implemented. I debugged this down
to the code snippet shown in the data for Q1. ‘a’ comes back as 0x11223344
into V (line 1723), V->getAsSymbol() comes back as NULL (line 1724), V is
known, so UnknownVal() is returned – matching the test case results.
There’s also a comment in the code at that point – “// Other cases: give
up. We are indexing into a larger object that has some value, but we don't
know how to handle that yet.”.
Seems it’s possible to construct concrete values for these cases, if I’m
not missing anything – but I’m not sure how to do this. I see the return
type, and the offset into the base type, so it seems as “easy” as bit
twiddling to return the expected byte or short values. Is there a review
that demonstrates how to accomplish this, perhaps some tips?
Question 2:
Looking through CStringChecker.cpp, I see that memset is implemented only
for the case of an initialization value of 0, and the sizeof arg and buffer
extents are equal (see example in data below, line 1076, also a FIXME in
the comments). This seems reasonable for most use cases, but is also
interesting since an implementation that allows for arbitrary
initialization values could open up possibilities for other improvements
(see Question 3). ProgramState.cpp has a method named
bindDefaultInitialValue that seems to be perfect for this application, but
asserts in RegionStore’s BindDefaultInitial -> “Double initialization!”
(approx line 452). Would repurposing this method (bindDefaultInitial) be a
reasonable approach to solve this problem, or is there another way that
would be a better approach?
Question 3:
Looking through CStringChecker.cpp, I see that memcpy does not perfectly
model the memory copy from source to destination memory. Memcpy looks to be
implemented as a blank invalidation of the source and destination buffers.
I had first looked at this problem, and was quite puzzled since I had
assumed memory copies should be modeled (at least for common cases). The
suggested approach in the comment to address this issue is using LCVs. Is
there an example of such an approach, or perhaps an outline to start with?
Case data for Question 1:
void clang_analyzer_eval(int);
void clang_analyzer_dump(int);
void clang_analyzer_printState();
int foo(void) {
unsigned int a = 0x11223344;
unsigned char *p = (unsigned char*)&a;
unsigned short *pu = (unsigned short *)&a;
clang_analyzer_printState();
clang_analyzer_eval(p[0] == 0x44); // expected-warning{{TRUE}} -- OK
clang_analyzer_eval(p[1] == 0x33); // expected-warning{{TRUE}} -- UNKNOWN
clang_analyzer_eval(p[2] == 0x22); // expected-warning{{TRUE}} -- UNKNOWN
clang_analyzer_eval(p[3] == 0x11); // expected-warning{{TRUE}} -- UNKNOWN
clang_analyzer_eval(pu[0] == 0x3344); // expected-warning{{TRUE}} -- OK
clang_analyzer_eval(pu[1] == 0x1122); // expected-warning{{TRUE}} --
UNKNOWN
clang_analyzer_dump(&p[0]); -- &Element{a,0 S64b,unsigned char} [as 32
bit integer]
clang_analyzer_dump(&p[1]); -- &Element{a,1 S64b,unsigned char} [as 32
bit integer]
clang_analyzer_dump(&pu[0]); -- &Element{a,0 S64b,unsigned short} [as 32
bit integer]
clang_analyzer_dump(&pu[1]); -- &Element{a,0 S64b,unsigned short} [as 32
bit integer]
}
The program state from the decls in the sample are shown here.
"program_state": {
"store": { "pointer": "0x76787f8", "items": [
{ "cluster": "a", "pointer": "0x7677fc0", "items": [
{ "kind": "Direct", "offset": 0, "value": "287454020 U32b" }
]},
{ "cluster": "p", "pointer": "0x7678728", "items": [
{ "kind": "Direct", "offset": 0, "value": "&Element{a,0 S64b,unsigned
char}" }
]},
{ "cluster": "pu", "pointer": "0x767d9a8", "items": [
{ "kind": "Direct", "offset": 0, "value": "&Element{a,0 S64b,unsigned
short}" }
]}
]},
"environment": { "pointer": "0x7607d50", "items": [
{ "lctx_id": 1, "location_context": "#0 Call", "calling": "foo",
"location": null, "items": [
{ "stmt_id": 805, "pretty": "clang_analyzer_printState", "value":
"&code{clang_analyzer_printState}" }
]}
]},
"constraints": null,
"dynamic_types": null,
"constructing_objects": null,
"checker_messages": null
}"program_state": {
"store": null,
"environment": { "pointer": "0x7607d50", "items": [
{ "lctx_id": 1, "location_context": "#0 Call", "calling": "foo",
"location": null, "items": [
{ "stmt_id": 1201, "pretty": "clang_analyzer_printState", "value":
"&code{clang_analyzer_printState}" }
]}
]},
"constraints": null,
"dynamic_types": null,
"constructing_objects": null,
"checker_messages": null
Code snippet from RegionStore.cpp, Function getBindingForElement, line #s
approximate.
1716 if (const TypedValueRegion *baseR =
1717 dyn_cast_or_null<TypedValueRegion>(O.getRegion())) {
1718 QualType baseT = baseR->getValueType();
1719 if (baseT->isScalarType()) {
1720 QualType elemT = R->getElementType();
1721 if (elemT->isScalarType()) {
1722 if (Ctx.getTypeSizeInChars(baseT) >=
Ctx.getTypeSizeInChars(elemT)) {
1723 if (const Optional<SVal> &V = B.getDirectBinding(superR)) {
(gdb) p V->dump()
287454020 U32b$2 = void
(gdb) p R->dump()
Element{a,1 S64b,unsigned char}$3 = void
1724 if (SymbolRef parentSym = V->getAsSymbol())
1725 return
svalBuilder.getDerivedRegionValueSymbolVal(parentSym, R);
1726
1727 if (V->isUnknownOrUndef())
1728 return *V;
1729 // Other cases: give up. We are indexing into a larger
object
1730 // that has some value, but we don't know how to handle
that yet.
1731 return UnknownVal();
1732 }
1733 }
1734 }
Question 2 – CStringChecker.cpp, method memsetAux
1071 if (StateWholeReg && !StateNotWholeReg && StateNullChar &&
1072 !StateNonNullChar) {
1073 // If the 'memset()' acts on the whole region of destination
buffer and
1074 // the value of the second argument of 'memset()' is zero, bind
the second
1075 // argument's value to the destination buffer with 'default
binding'.
1076 // FIXME: Since there is no perfect way to bind the non-zero
character, we
1077 // can only deal with zero value here. In the future, we need to
deal with
1078 // the binding of non-zero value in the case of whole region.
1079 State = State->bindDefaultZero(svalBuilder.makeLoc(BR),
1080 C.getLocationContext());
Question 3 – CStringChecker.cpp, method evalCopyCommon
1201 // Invalidate the destination (regular invalidation without
pointer-escaping
1202 // the address of the top-level region).
1203 // FIXME: Even if we can't perfectly model the copy, we should see
if we
1204 // can use LazyCompoundVals to copy the source values into the
destination.
1205 // This would probably remove any existing bindings past the end
of the
1206 // copied region, but that's still an improvement over blank
invalidation.
1207 state =
1208 InvalidateBuffer(C, state, Dest.Expression,
C.getSVal(Dest.Expression),
1209 /*IsSourceBuffer*/ false, Size.Expression);
1210
1211 // Invalidate the source (const-invalidation without
const-pointer-escaping
1212 // the address of the top-level region).
1213 state = InvalidateBuffer(C, state, Source.Expression,
1214 C.getSVal(Source.Expression),
1215 /*IsSourceBuffer*/ true, nullptr);
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200518/24c52254/attachment-0001.html>
More information about the cfe-dev
mailing list