[cfe-dev] Memory modeling, RegionStore Questions

Jakob Kalchik via cfe-dev cfe-dev at lists.llvm.org
Mon May 18 04:38:55 PDT 2020


Hello, I’ve been studying the RegionStore implementation of the static
analyzer in an attempt to debug some false positives I’ve come across. I
found the paper “Memory Model for the Static Analysis of C Programs” and
Artem’s workbook (“Clang Static Analyzer – A Checker Developer’s Guide”) ,
and read through many past articles. These are excellent references to get
started, now I'd like to enhance my comprehension by jumping in and fixing
a few issues that may need to be fixed. I apologize in advance if I’m
missing some obvious bridges to comprehension, but I’m still early in the
process of learning this. I’d like to contribute to improving the static
analyzer and learn best through debugging and implementing. Which brings me
to the discussion questions and asks for a few tips on next steps. Thanks
in advance for tips and suggestions for how to contribute.

Question 1:

Constructing a case to understanding how data and pointers to data are
mapped, I came across some unexpected behavior. In the example below, I’d
expect the UNKNOWN values to evaluate to true. Is this a reasonable
expectation for the clang static analyzer?

Digging into this problem a bit more with the debugger, it seems at least a
small portion of RegionStore is not fully implemented. I debugged this down
to the code snippet shown in the data for Q1. ‘a’ comes back as 0x11223344
into V (line 1723), V->getAsSymbol() comes back as NULL (line 1724), V is
known, so UnknownVal() is returned – matching the test case results.
There’s also a comment in the code at that point – “// Other cases: give
up.  We are indexing into a larger object that has some value, but we don't
know how to handle that yet.”.

Seems it’s possible to construct concrete values for these cases, if I’m
not missing anything – but I’m not sure how to do this. I see the return
type, and the offset into the base type, so it seems as “easy” as bit
twiddling to return the expected byte or short values. Is there a review
that demonstrates how to accomplish this, perhaps some tips?

Question 2:

Looking through CStringChecker.cpp, I see that memset is implemented only
for the case of an initialization value of 0, and the sizeof arg and buffer
extents are equal (see example in data below, line 1076, also a FIXME in
the comments). This seems reasonable for most use cases, but is also
interesting since an implementation that allows for arbitrary
initialization values could open up possibilities for other improvements
(see Question 3). ProgramState.cpp has a method named
bindDefaultInitialValue that seems to be perfect for this application, but
asserts in RegionStore’s BindDefaultInitial -> “Double initialization!”
(approx line 452). Would repurposing this method (bindDefaultInitial) be a
reasonable approach to solve this problem, or is there another way that
would be a better approach?

Question 3:

Looking through CStringChecker.cpp, I see that memcpy does not perfectly
model the memory copy from source to destination memory. Memcpy looks to be
implemented as a blank invalidation of the source and destination buffers.
I had first looked at this problem, and was quite puzzled since I had
assumed memory copies should be modeled (at least for common cases). The
suggested approach in the comment to address this issue is using LCVs. Is
there an example of such an approach, or perhaps an outline to start with?



Case data for Question 1:

void clang_analyzer_eval(int);

void clang_analyzer_dump(int);

void clang_analyzer_printState();



int foo(void) {

  unsigned int a = 0x11223344;

  unsigned char *p = (unsigned char*)&a;

  unsigned short *pu = (unsigned short *)&a;



  clang_analyzer_printState();

  clang_analyzer_eval(p[0] == 0x44); // expected-warning{{TRUE}} -- OK

  clang_analyzer_eval(p[1] == 0x33); // expected-warning{{TRUE}} -- UNKNOWN

  clang_analyzer_eval(p[2] == 0x22); // expected-warning{{TRUE}} -- UNKNOWN

  clang_analyzer_eval(p[3] == 0x11); // expected-warning{{TRUE}} -- UNKNOWN



  clang_analyzer_eval(pu[0] == 0x3344); // expected-warning{{TRUE}} -- OK

  clang_analyzer_eval(pu[1] == 0x1122); // expected-warning{{TRUE}} --
UNKNOWN



  clang_analyzer_dump(&p[0]); -- &Element{a,0 S64b,unsigned char} [as 32
bit integer]

  clang_analyzer_dump(&p[1]); -- &Element{a,1 S64b,unsigned char} [as 32
bit integer]



  clang_analyzer_dump(&pu[0]); -- &Element{a,0 S64b,unsigned short} [as 32
bit integer]

  clang_analyzer_dump(&pu[1]); -- &Element{a,0 S64b,unsigned short} [as 32
bit integer]



}



The program state from the decls in the sample are shown here.



"program_state": {

  "store": { "pointer": "0x76787f8", "items": [

    { "cluster": "a", "pointer": "0x7677fc0", "items": [

      { "kind": "Direct", "offset": 0, "value": "287454020 U32b" }

    ]},

    { "cluster": "p", "pointer": "0x7678728", "items": [

      { "kind": "Direct", "offset": 0, "value": "&Element{a,0 S64b,unsigned
char}" }

    ]},

    { "cluster": "pu", "pointer": "0x767d9a8", "items": [

      { "kind": "Direct", "offset": 0, "value": "&Element{a,0 S64b,unsigned
short}" }

    ]}

  ]},

  "environment": { "pointer": "0x7607d50", "items": [

    { "lctx_id": 1, "location_context": "#0 Call", "calling": "foo",
"location": null, "items": [

      { "stmt_id": 805, "pretty": "clang_analyzer_printState", "value":
"&code{clang_analyzer_printState}" }

    ]}

  ]},

  "constraints": null,

  "dynamic_types": null,

  "constructing_objects": null,

  "checker_messages": null

}"program_state": {

  "store": null,

  "environment": { "pointer": "0x7607d50", "items": [

    { "lctx_id": 1, "location_context": "#0 Call", "calling": "foo",
"location": null, "items": [

      { "stmt_id": 1201, "pretty": "clang_analyzer_printState", "value":
"&code{clang_analyzer_printState}" }

    ]}

  ]},

  "constraints": null,

  "dynamic_types": null,

  "constructing_objects": null,

  "checker_messages": null



Code snippet from RegionStore.cpp, Function getBindingForElement, line #s
approximate.



1716   if (const TypedValueRegion *baseR =

1717         dyn_cast_or_null<TypedValueRegion>(O.getRegion())) {

1718     QualType baseT = baseR->getValueType();

1719     if (baseT->isScalarType()) {

1720       QualType elemT = R->getElementType();

1721       if (elemT->isScalarType()) {

1722         if (Ctx.getTypeSizeInChars(baseT) >=
Ctx.getTypeSizeInChars(elemT)) {

1723           if (const Optional<SVal> &V = B.getDirectBinding(superR)) {



(gdb) p V->dump()

287454020 U32b$2 = void

(gdb) p R->dump()

Element{a,1 S64b,unsigned char}$3 = void



1724             if (SymbolRef parentSym = V->getAsSymbol())

1725               return
svalBuilder.getDerivedRegionValueSymbolVal(parentSym, R);

1726

1727             if (V->isUnknownOrUndef())

1728               return *V;

1729             // Other cases: give up.  We are indexing into a larger
object

1730             // that has some value, but we don't know how to handle
that yet.

1731             return UnknownVal();

1732           }

1733         }

1734       }



Question 2 – CStringChecker.cpp, method memsetAux



1071     if (StateWholeReg && !StateNotWholeReg && StateNullChar &&

1072         !StateNonNullChar) {

1073       // If the 'memset()' acts on the whole region of destination
buffer and

1074       // the value of the second argument of 'memset()' is zero, bind
the second

1075       // argument's value to the destination buffer with 'default
binding'.

1076       // FIXME: Since there is no perfect way to bind the non-zero
character, we

1077       // can only deal with zero value here. In the future, we need to
deal with

1078       // the binding of non-zero value in the case of whole region.

1079       State = State->bindDefaultZero(svalBuilder.makeLoc(BR),

1080                                      C.getLocationContext());





Question 3 – CStringChecker.cpp, method evalCopyCommon



1201     // Invalidate the destination (regular invalidation without
pointer-escaping

1202     // the address of the top-level region).

1203     // FIXME: Even if we can't perfectly model the copy, we should see
if we

1204     // can use LazyCompoundVals to copy the source values into the
destination.

1205     // This would probably remove any existing bindings past the end
of the

1206     // copied region, but that's still an improvement over blank
invalidation.

1207     state =

1208         InvalidateBuffer(C, state, Dest.Expression,
C.getSVal(Dest.Expression),

1209                          /*IsSourceBuffer*/ false, Size.Expression);

1210

1211     // Invalidate the source (const-invalidation without
const-pointer-escaping

1212     // the address of the top-level region).

1213     state = InvalidateBuffer(C, state, Source.Expression,

1214                              C.getSVal(Source.Expression),

1215                              /*IsSourceBuffer*/ true, nullptr);
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200518/24c52254/attachment-0001.html>


More information about the cfe-dev mailing list