[cfe-dev] [analyzer] Pointer cast representation problems

Artem Dergachev via cfe-dev cfe-dev at lists.llvm.org
Sun Dec 13 21:55:05 PST 2020

Ok, first of all, the example false positive you're debugging is not an 
example of the bug you're trying to look into, but an example of a 
completely different long-standing bug: 
https://bugs.llvm.org/show_bug.cgi?id=44114. Both bugzilla entries 
contain a large number of examples to choose from. None of those are 
beginner bugs though, the reason why they're not fixed is because they 
require a relatively large amount of work and a relatively good 
understanding of the subject.

For your specific example, no, you didn't identify the buggy transition 
correctly, so you're debugging the part that has nothing to do with the 
false positive. I strongly recommend finding the buggy transition in 
your ExplodedGraph dump *before* jumping into the debugger. This usually 
goes from bottom to top, exactly like you're expected to read a static 
analyzer report. Let me explain this procedure on your example.

I attached a screenshot of the tail of the ExplodedGraph dump for your 
code (i hope this works fine with the mailing list). That's just three 
last nodes of the graph but they're enough to identify the problem.

Node 53 is the bug node, as stated in red; this is the node that was 
produced by generateErrorNode() and fed into the BugReport object. The 
bug report says "Branch condition evaluates to a garbage value". The AST 
expression p[1] (which acts as the branch condition which we can confirm 
by looking at the source code) has symbolic value "Undefined". Therefore 
we can conclude that the bug report was emitted correctly and the our 
problem is not in the checker callback that emitted the bug report; it 
was simply fed an incorrect State 1337. Since the only reason why the 
bug report was emitted was that the value was Undefined, we have to find 
a node in which "Undefined" was written into the state.

Node 52 doesn't update the state, so we can skip it. In fact it's nicely 
grouped with node 51 so we barely even notice it.

Node 51 is the nearest node on which the AST expression p[1] was first 
evaluated into Undefined. Whatever code was responsible for evaluating 
node 51 is directly responsible for assigning Undefined value to the 
expression - which ultimately led to the false positive in node 53. It 
is easy to see what was supposed to happen at this node: this is a 
memory load (which is also known in C and C++ formalism as "implicit 
lvalue-to-rvalue conversion" - the operation that converts 
object-as-in-"location" into object-as-in-"data"). The operation does 
not change contents of memory, it only reads the existing memory.

Static analyzer represents the knowledge of existing memory in a data 
structure known as the Store, aka "Region Store". Straightforwardly, it 
maps "memory regions" (symbolic representations of locations) into 
arbitrary SVals (symbolic representations of data). No, Balazs, it 
doesn't map "values to expressions" =/ The Store is written down in the 
dump and labeled accordingly. In our case there are three entries in the 

(1) Variable "myvar" has 16-bit unsigned integer 4386 written into it at 
byte offset 0;
(2) Variable "p" has a pointer to the first char (hence index 0) of 
variable "myvar" written into it at byte offset 0.
(3) Variable "x" has 32-bit signed integer 1 written into it at byte 
offset 0.

The location we're loading from is already evaluated: it's the value of 
AST expression p[1] before the load. The Expressions section (aka the 
Environment) contains two entries for p[1] because implicit 
lvalue-to-rvalue casts do not have any visual representation in C, 
however by matching statement identifiers (S780 and S784) with the 
respective program point dumps (S780: ArraySubscriptExpr, S784: 
ImplicitCastExpr (LValueToRValue)) you can easily see that S780 is the 
sub-expression of the load and S784 is the load itself. Therefore the 
location which we're loading from is the value of AST expression p[1], 
namely the *second* char of variable "myvar".

Knowing these facts about the memory of our program, we need to decide 
whether the load was performed correctly. If the load was performed 
incorrectly, our job is done: we've found the root cause. If the load 
was performed correctly that'd have meant that the state is incorrect 
and we have to go further up in order to find it.

It's fairly obvious that the load was not performed correctly. Looking 
at Store entry (1), it is easy to see that the second character of 
variable "myvar" has value 17 rather than "Undefined".

Therefore our problem consists entirely in incorrect construction of 
node 51. Now's the good time to jump into the debugger, set conditional 
breakpoint on evalLoad with condition "predecessor node has identifier 
50" and debug from there.

The pointer cast that you were trying to debug was not at fault. We can 
still discuss it separately but it's not what really causes this 
specific false positive to show up.

This entire debugging procedure is very straightforward and requires 
almost no thinking. You simply match static analyzer's simulation of the 
program's behavior to the actual runtime behavior of the program and 
check whether the former is a correct abstraction over the latter. 
Static analyzer is just an interpreter (cf. "abstract interpretation") 
so it's very easy to reason about the correctness of every transition as 
long as you know the language it's trying to interpret (in this case, 
C). Which is why i made this 2018 conference talk in order to explain 
it. The only non-trivial part is knowing the structure of the program 
state (store vs environment, what are regions and symbols and how to 
read them) which is why i did this whole workbook thing on that subject 
in 2016. I wish i did it properly so that all the documentation was in 
one place but for now that's where we are i guess.

On 12/7/20 7:40 AM, Balázs Benics via cfe-dev wrote:
> Ok, you are saying that we should know that `p[1]` points to an object 
> which was initialized. And according to the target's endianness, 
> resolve the DeclRefExpr to the appropriate constant value.
> The dumps are probably from the line `char *p = &myvar;` if I'm right.
> That is the reason why the right-hand side is evaluated to the 
> location of `myvar`, but wrapped into an ElementRegion of type char 
> representing the reinterpret cast.
> IMO, it's the correct behavior to this point.
> You should probably dig around the evaluation of the `DeclRefExpr` of 
> the expression `p[1]`. That should return your concrete value.
> AFAIK, the store maps the values to expressions and the problem 
> probably resides there.
> But it's just a wide guess :D
> Unfortunately, that is all I can say about this right now :(
> Balazs.
> Vince Bridgers via cfe-dev <cfe-dev at lists.llvm.org 
> <mailto:cfe-dev at lists.llvm.org>> ezt írta (időpont: 2020. dec. 7., H, 
> 11:54):
>     Hi all, I thought I'd look at this problem
>     https://bugs.llvm.org/show_bug.cgi?id=43364
>     <https://bugs.llvm.org/show_bug.cgi?id=43364> (Pointer cast
>     representation problems). maybe use it as an opportunity to dig
>     into the inner workings of the analyzer and maybe even solve a
>     concrete problem :)   But it seems I need guidance about possible
>     paths to pursue.
>     Starting with a concrete case, a simple reproducer is below. I
>     debugged this down to SimpleSValBuilder.cpp:evalCastFromLoc(). val
>     is an SVal and castTy is a Loc, and this code path attempts to
>     extract a concrete integer from the SVal (did I get this right?).
>     So I think a solution to this specific case would be to dig into
>     the SVal to see if casted data is concrete, and extract that data.
>     Seems to me this would be the location for that (evalCastFromLoc
>     seems appropropriate enough). If that's true, can someone point me
>     to an example that's similar? I'll keep digging, but thought I'd
>     ask in case this is easy for someone to drop a few helpful hints.
>     I've included some select dumps, state and a bt below in case this
>     is helpful.
>     Best
>     (gdb) p val.dump()
>     &Element{myvar,0 S64b,char}$2 = void
>     (gdb) p castTy.dump()
>     PointerType 0xf931740 'char *'
>     `-BuiltinType 0xf930b50 'char'
>     $3 = void
>     Invocation: clang -cc1 -analyze -analyzer-checker=core case.c
>     struct mystruct {
>       unsigned short _u16;
>     };
>     int main(void) {
>       struct mystruct myvar = { 0x1122 };
>       char *p = &myvar;
>       int x = 0;
>       if (p[0])
>         x+=1;
>       if (p[1]) // Branch condition evaluates to a garbage value
>     [core.uninitialized.Branch]
>         x+=1;
>       return x;
>     }
>     (gdb) p state
>     $4 = {Obj = 0xf9b81b8}
>     (gdb) p state->dump()
>     "program_state": {
>       "store": { "pointer": "0xf9b6fa2", "items": [
>         { "cluster": "myvar", "pointer": "0xf9b2250", "items": [
>           { "kind": "Direct", "offset": 0, "value": "4386 U16b" }
>         ]},
>         { "cluster": "p", "pointer": "0xf9b2aa8", "items": [
>           { "kind": "Direct", "offset": 0, "value": "&Element{myvar,0
>     S64b,char}" }
>         ]},
>         { "cluster": "x", "pointer": "0xf9b6ed0", "items": [
>           { "kind": "Direct", "offset": 0, "value": "1 S32b" }
>         ]}
>       ]},
>       "environment": { "pointer": "0xf9b16e0", "items": [
>         { "lctx_id": 1, "location_context": "#0 Call", "calling":
>     "main", "location": null, "items": [
>           { "stmt_id": 898, "pretty": "p", "value": "&p" }
>         ]}
>       ]},
>       "constraints": null,
>       "dynamic_types": null,
>       "dynamic_casts": null,
>       "constructing_objects": null,
>       "checker_messages": null
>     }$5 = void
>     (gdb) bt
>     #0  (anonymous namespace)::SimpleSValBuilder::evalCastFromLoc
>     (this=0xf9aedf0, val=..., castTy=...) at
>     ../../clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp:136
>     #1  0x0000000007ec6ecb in (anonymous
>     namespace)::SimpleSValBuilder::dispatchCast (this=0xf9aedf0,
>     Val=..., CastTy=...) at
>     ../../clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp:70
>     #2  0x0000000007edd75e in
>     clang::ento::StoreManager::CastRetrievedVal (this=0xf9b02a0,
>     V=..., R=0xf9b2aa8, castTy=...) at
>     ../../clang/lib/StaticAnalyzer/Core/Store.cpp:438
>     #3  0x0000000007ea0480 in (anonymous
>     namespace)::RegionStoreManager::getBinding (this=0xf9b02a0, B=...,
>     L=..., T=...) at
>     ../../clang/lib/StaticAnalyzer/Core/RegionStore.cpp:1511
>     #4  0x0000000007e9d059 in (anonymous
>     namespace)::RegionStoreManager::getBinding (this=0xf9b02a0,
>     S=0xf9b6fa2, L=..., T=...) at
>     ../../clang/lib/StaticAnalyzer/Core/RegionStore.cpp:551
>     #5  0x0000000007753856 in clang::ento::ProgramState::getRawSVal
>     (this=0xf9b81b8, LV=..., T=...) at
>     ../../clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h:785
>     #6  0x0000000007e64d61 in clang::ento::ProgramState::getSVal
>     (this=0xf9b81b8, location=..., T=...) at
>     ../../clang/lib/StaticAnalyzer/Core/ProgramState.cpp:267
>     #7  0x0000000007df0d77 in clang::ento::ExprEngine::evalLoad
>     (this=0x7fffffffa828, Dst=..., NodeEx=0xf985e00,
>     BoundEx=0xf985e00, Pred=0xf9b8230, state=..., location=...,
>     tag=0x0, LoadTy=...)
>         at ../../clang/lib/StaticAnalyzer/Core/ExprEngine.cpp:2982
>     #8  0x0000000007e10402 in clang::ento::ExprEngine::VisitCast
>     (this=0x7fffffffa828, CastE=0xf985e00, Ex=0xf985dc0,
>     Pred=0xf9b8230, Dst=...) at
>     ../../clang/lib/StaticAnalyzer/Core/ExprEngineC.cpp:313
>     #9  0x0000000007de88a8 in clang::ento::ExprEngine::Visit
>     (this=0x7fffffffa828, S=0xf985e00, Pred=0xf9b8230, DstTop=...) at
>     ../../clang/lib/StaticAnalyzer/Core/ExprEngine.cpp:1729
>     #10 0x0000000007de4d9c in clang::ento::ExprEngine::ProcessStmt
>     (this=0x7fffffffa828, currStmt=0xf985e00, Pred=0xf9b8230) at
>     ../../clang/lib/StaticAnalyzer/Core/ExprEngine.cpp:791
>     #11 0x0000000007de4a89 in
>     clang::ento::ExprEngine::processCFGElement (this=0x7fffffffa828,
>     E=..., Pred=0xf9b8230, StmtIdx=1, Ctx=0x7fffffffa338) at
>     ../../clang/lib/StaticAnalyzer/Core/ExprEngine.cpp:637
>     #12 0x0000000007db6fa9 in clang::ento::CoreEngine::HandlePostStmt
>     (this=0x7fffffffa848, B=0xf9a8860, StmtIdx=1, Pred=0xf9b8230) at
>     ../../clang/lib/StaticAnalyzer/Core/CoreEngine.cpp:466
>     #13 0x0000000007db6717 in
>     clang::ento::CoreEngine::dispatchWorkItem (this=0x7fffffffa848,
>     Pred=0xf9b8230, Loc=..., WU=...) at
>     ../../clang/lib/StaticAnalyzer/Core/CoreEngine.cpp:191
>     #14 0x0000000007db6323 in clang::ento::CoreEngine::ExecuteWorkList
>     (this=0x7fffffffa848, L=0xf9b16e0, Steps=224975, InitState=...) at
>     ../../clang/lib/StaticAnalyzer/Core/CoreEngine.cpp:147
>     #15 0x000000000734dc74 in clang::ento::ExprEngine::ExecuteWorkList
>     (this=0x7fffffffa828, L=0xf9b16e0, Steps=225000) at
>     ../../clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h:192
>     _______________________________________________
>     cfe-dev mailing list
>     cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>     https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>     <https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20201213/8ac831e2/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: screenshot.png
Type: image/png
Size: 204172 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20201213/8ac831e2/attachment-0001.png>

More information about the cfe-dev mailing list