[cfe-dev] [analyzer] Pointer cast representation problems

Artem Dergachev via cfe-dev cfe-dev at lists.llvm.org
Tue Dec 15 13:17:27 PST 2020


Np!

A few self-corrections below.

On 12/14/20 6:23 AM, Vince Bridgers wrote:
> Hi Artem, thank you for taking the time to explain the background how 
> to chase this down. I'll carefully read through the materials and 
> references provided and continue to build up knowledge on how to work 
> on this part of the static analyzer.
>
> Best - Vince
>
> On Sun, Dec 13, 2020 at 11:55 PM Artem Dergachev <noqnoqneo at gmail.com 
> <mailto:noqnoqneo at gmail.com>> wrote:
>
>     Ok, first of all, the example false positive you're debugging is
>     not an example of the bug you're trying to look into, but an
>     example of a completely different long-standing bug:
>     https://bugs.llvm.org/show_bug.cgi?id=44114
>     <https://bugs.llvm.org/show_bug.cgi?id=44114>. Both bugzilla
>     entries contain a large number of examples to choose from. None of
>     those are beginner bugs though, the reason why they're not fixed
>     is because they require a relatively large amount of work and a
>     relatively good understanding of the subject.
>
>     For your specific example, no, you didn't identify the buggy
>     transition correctly, so you're debugging the part that has
>     nothing to do with the false positive. I strongly recommend
>     finding the buggy transition in your ExplodedGraph dump *before*
>     jumping into the debugger. This usually goes from bottom to top,
>     exactly like you're expected to read a static analyzer report. Let
>     me explain this procedure on your example.
>
>     I attached a screenshot of the tail of the ExplodedGraph dump for
>     your code (i hope this works fine with the mailing list).
>

Narrator: "It didn't". For future generations: it was 
https://i.imgur.com/Q0ErgPe.png

>     That's just three last nodes of the graph but they're enough to
>     identify the problem.
>
>     Node 53 is the bug node, as stated in red; this is the node that
>     was produced by generateErrorNode() and fed into the BugReport
>     object. The bug report says "Branch condition evaluates to a
>     garbage value". The AST expression p[1] (which acts as the branch
>     condition which we can confirm by looking at the source code) has
>     symbolic value "Undefined". Therefore we can conclude that the bug
>     report was emitted correctly and the our problem is not in the
>     checker callback that emitted the bug report; it was simply fed an
>     incorrect State 1337. Since the only reason why the bug report was
>     emitted was that the value was Undefined, we have to find a node
>     in which "Undefined" was written into the state.
>
>     Node 52 doesn't update the state, so we can skip it. In fact it's
>     nicely grouped with node 51 so we barely even notice it.
>
>     Node 51 is the nearest node on which the AST expression p[1] was
>     first evaluated into Undefined. Whatever code was responsible for
>     evaluating node 51 is directly responsible for assigning Undefined
>     value to the expression - which ultimately led to the false
>     positive in node 53. It is easy to see what was supposed to happen
>     at this node: this is a memory load (which is also known in C and
>     C++ formalism as "implicit lvalue-to-rvalue conversion" - the
>     operation that converts object-as-in-"location" into
>     object-as-in-"data"). The operation does not change contents of
>     memory, it only reads the existing memory.
>
>     Static analyzer represents the knowledge of existing memory in a
>     data structure known as the Store, aka "Region Store".
>     Straightforwardly, it maps "memory regions" (symbolic
>     representations of locations) into arbitrary SVals (symbolic
>     representations of data). No, Balazs, it doesn't map "values to
>     expressions" =/ The Store is written down in the dump and labeled
>     accordingly. In our case there are three entries in the Store:
>
>     (1) Variable "myvar" has 16-bit unsigned integer 4386 written into
>     it at byte offset 0;
>     (2) Variable "p" has a pointer to the first char (hence index 0)
>     of variable "myvar" written into it at byte offset 0.
>     (3) Variable "x" has 32-bit signed integer 1 written into it at
>     byte offset 0.
>
>     The location we're loading from is already evaluated: it's the
>     value of AST expression p[1] before the load. The Expressions
>     section (aka the Environment) contains two entries for p[1]
>     because implicit lvalue-to-rvalue casts do not have any visual
>     representation in C, however by matching statement identifiers
>     (S780 and S784) with the respective program point dumps (S780:
>     ArraySubscriptExpr, S784: ImplicitCastExpr (LValueToRValue)) you
>     can easily see that S780 is the sub-expression of the load and
>     S784 is the load itself. Therefore the location which we're
>     loading from is the value of AST expression p[1], namely the
>     *second* char of variable "myvar".
>
>     Knowing these facts about the memory of our program, we need to
>     decide whether the load was performed correctly. If the load was
>     performed incorrectly, our job is done: we've found the root
>     cause. If the load was performed correctly that'd have meant that
>     the state is incorrect and we have to go further up in order to
>     find it.
>
>     It's fairly obvious that the load was not performed correctly.
>     Looking at Store entry (1), it is easy to see that the second
>     character of variable "myvar" has value 17 rather than "Undefined".
>
>     Therefore our problem consists entirely in incorrect construction
>     of node 51. Now's the good time to jump into the debugger, set
>     conditional breakpoint on evalLoad with condition "predecessor
>     node has identifier 50" and debug from there.
>
>     The pointer cast that you were trying to debug was not at fault.
>     We can still discuss it separately but it's not what really causes
>     this specific false positive to show up.
>
>     This entire debugging procedure is very straightforward and
>     requires almost no thinking. You simply match static analyzer's
>     simulation of the program's behavior to the actual runtime
>     behavior of the program and check whether the former is a correct
>     abstraction over the latter. Static analyzer is just an
>     interpreter (cf. "abstract interpretation") so it's very easy to
>     reason about the correctness of every transition as long as you
>     know the language it's trying to interpret (in this case, C).
>     Which is why i made this 2018 conference talk in order to explain it.
>

*2019. The link is https://www.youtube.com/watch?v=g0Mqx1niUi0
The 2018 one is not particularly relevant here.

>     The only non-trivial part is knowing the structure of the program
>     state (store vs environment, what are regions and symbols and how
>     to read them) which is why i did this whole workbook thing on that
>     subject in 2016. I wish i did it properly so that all the
>     documentation was in one place but for now that's where we are i
>     guess.
>
>     On 12/7/20 7:40 AM, Balázs Benics via cfe-dev wrote:
>>     Ok, you are saying that we should know that `p[1]` points to an
>>     object which was initialized. And according to the target's
>>     endianness, resolve the DeclRefExpr to the appropriate constant
>>     value.
>>
>>     The dumps are probably from the line `char *p = &myvar;` if I'm
>>     right.
>>     That is the reason why the right-hand side is evaluated to the
>>     location of `myvar`, but wrapped into an ElementRegion of type
>>     char representing the reinterpret cast.
>>     IMO, it's the correct behavior to this point.
>>
>>     You should probably dig around the evaluation of the
>>     `DeclRefExpr` of the expression `p[1]`. That should return your
>>     concrete value.
>>     AFAIK, the store maps the values to expressions and the problem
>>     probably resides there.
>>     But it's just a wide guess :D
>>
>>     Unfortunately, that is all I can say about this right now :(
>>
>>     Balazs.
>>
>>     Vince Bridgers via cfe-dev <cfe-dev at lists.llvm.org
>>     <mailto:cfe-dev at lists.llvm.org>> ezt írta (időpont: 2020. dec.
>>     7., H, 11:54):
>>
>>         Hi all, I thought I'd look at this problem
>>         https://bugs.llvm.org/show_bug.cgi?id=43364
>>         <https://bugs.llvm.org/show_bug.cgi?id=43364> (Pointer cast
>>         representation problems). maybe use it as an opportunity to
>>         dig into the inner workings of the analyzer and maybe even
>>         solve a concrete problem :)   But it seems I need guidance
>>         about possible paths to pursue.
>>
>>         Starting with a concrete case, a simple reproducer is below.
>>         I debugged this down to
>>         SimpleSValBuilder.cpp:evalCastFromLoc(). val is an SVal and
>>         castTy is a Loc, and this code path attempts to extract a
>>         concrete integer from the SVal (did I get this right?). So I
>>         think a solution to this specific case would be to dig into
>>         the SVal to see if casted data is concrete, and extract that
>>         data. Seems to me this would be the location for that
>>         (evalCastFromLoc seems appropropriate enough). If that's
>>         true, can someone point me to an example that's similar? I'll
>>         keep digging, but thought I'd ask in case this is easy for
>>         someone to drop a few helpful hints.
>>
>>         I've included some select dumps, state and a bt below in case
>>         this is helpful.
>>
>>         Best
>>
>>         (gdb) p val.dump()
>>         &Element{myvar,0 S64b,char}$2 = void
>>
>>         (gdb) p castTy.dump()
>>         PointerType 0xf931740 'char *'
>>         `-BuiltinType 0xf930b50 'char'
>>         $3 = void
>>
>>
>>         Invocation: clang -cc1 -analyze -analyzer-checker=core case.c
>>
>>         struct mystruct {
>>           unsigned short _u16;
>>         };
>>
>>         int main(void) {
>>           struct mystruct myvar = { 0x1122 };
>>
>>           char *p = &myvar;
>>           int x = 0;
>>           if (p[0])
>>             x+=1;
>>           if (p[1]) // Branch condition evaluates to a garbage value
>>         [core.uninitialized.Branch]
>>             x+=1;
>>           return x;
>>         }
>>
>>         (gdb) p state
>>         $4 = {Obj = 0xf9b81b8}
>>         (gdb) p state->dump()
>>         "program_state": {
>>           "store": { "pointer": "0xf9b6fa2", "items": [
>>             { "cluster": "myvar", "pointer": "0xf9b2250", "items": [
>>               { "kind": "Direct", "offset": 0, "value": "4386 U16b" }
>>             ]},
>>             { "cluster": "p", "pointer": "0xf9b2aa8", "items": [
>>               { "kind": "Direct", "offset": 0, "value":
>>         "&Element{myvar,0 S64b,char}" }
>>             ]},
>>             { "cluster": "x", "pointer": "0xf9b6ed0", "items": [
>>               { "kind": "Direct", "offset": 0, "value": "1 S32b" }
>>             ]}
>>           ]},
>>           "environment": { "pointer": "0xf9b16e0", "items": [
>>             { "lctx_id": 1, "location_context": "#0 Call", "calling":
>>         "main", "location": null, "items": [
>>               { "stmt_id": 898, "pretty": "p", "value": "&p" }
>>             ]}
>>           ]},
>>           "constraints": null,
>>           "dynamic_types": null,
>>           "dynamic_casts": null,
>>           "constructing_objects": null,
>>           "checker_messages": null
>>         }$5 = void
>>
>>
>>         (gdb) bt
>>         #0  (anonymous namespace)::SimpleSValBuilder::evalCastFromLoc
>>         (this=0xf9aedf0, val=..., castTy=...) at
>>         ../../clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp:136
>>         #1  0x0000000007ec6ecb in (anonymous
>>         namespace)::SimpleSValBuilder::dispatchCast (this=0xf9aedf0,
>>         Val=..., CastTy=...) at
>>         ../../clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp:70
>>         #2  0x0000000007edd75e in
>>         clang::ento::StoreManager::CastRetrievedVal (this=0xf9b02a0,
>>         V=..., R=0xf9b2aa8, castTy=...) at
>>         ../../clang/lib/StaticAnalyzer/Core/Store.cpp:438
>>         #3  0x0000000007ea0480 in (anonymous
>>         namespace)::RegionStoreManager::getBinding (this=0xf9b02a0,
>>         B=..., L=..., T=...) at
>>         ../../clang/lib/StaticAnalyzer/Core/RegionStore.cpp:1511
>>         #4  0x0000000007e9d059 in (anonymous
>>         namespace)::RegionStoreManager::getBinding (this=0xf9b02a0,
>>         S=0xf9b6fa2, L=..., T=...) at
>>         ../../clang/lib/StaticAnalyzer/Core/RegionStore.cpp:551
>>         #5  0x0000000007753856 in
>>         clang::ento::ProgramState::getRawSVal (this=0xf9b81b8,
>>         LV=..., T=...) at
>>         ../../clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h:785
>>         #6  0x0000000007e64d61 in clang::ento::ProgramState::getSVal
>>         (this=0xf9b81b8, location=..., T=...) at
>>         ../../clang/lib/StaticAnalyzer/Core/ProgramState.cpp:267
>>         #7  0x0000000007df0d77 in clang::ento::ExprEngine::evalLoad
>>         (this=0x7fffffffa828, Dst=..., NodeEx=0xf985e00,
>>         BoundEx=0xf985e00, Pred=0xf9b8230, state=..., location=...,
>>         tag=0x0, LoadTy=...)
>>             at ../../clang/lib/StaticAnalyzer/Core/ExprEngine.cpp:2982
>>         #8  0x0000000007e10402 in clang::ento::ExprEngine::VisitCast
>>         (this=0x7fffffffa828, CastE=0xf985e00, Ex=0xf985dc0,
>>         Pred=0xf9b8230, Dst=...) at
>>         ../../clang/lib/StaticAnalyzer/Core/ExprEngineC.cpp:313
>>         #9  0x0000000007de88a8 in clang::ento::ExprEngine::Visit
>>         (this=0x7fffffffa828, S=0xf985e00, Pred=0xf9b8230,
>>         DstTop=...) at
>>         ../../clang/lib/StaticAnalyzer/Core/ExprEngine.cpp:1729
>>         #10 0x0000000007de4d9c in
>>         clang::ento::ExprEngine::ProcessStmt (this=0x7fffffffa828,
>>         currStmt=0xf985e00, Pred=0xf9b8230) at
>>         ../../clang/lib/StaticAnalyzer/Core/ExprEngine.cpp:791
>>         #11 0x0000000007de4a89 in
>>         clang::ento::ExprEngine::processCFGElement
>>         (this=0x7fffffffa828, E=..., Pred=0xf9b8230, StmtIdx=1,
>>         Ctx=0x7fffffffa338) at
>>         ../../clang/lib/StaticAnalyzer/Core/ExprEngine.cpp:637
>>         #12 0x0000000007db6fa9 in
>>         clang::ento::CoreEngine::HandlePostStmt (this=0x7fffffffa848,
>>         B=0xf9a8860, StmtIdx=1, Pred=0xf9b8230) at
>>         ../../clang/lib/StaticAnalyzer/Core/CoreEngine.cpp:466
>>         #13 0x0000000007db6717 in
>>         clang::ento::CoreEngine::dispatchWorkItem
>>         (this=0x7fffffffa848, Pred=0xf9b8230, Loc=..., WU=...) at
>>         ../../clang/lib/StaticAnalyzer/Core/CoreEngine.cpp:191
>>         #14 0x0000000007db6323 in
>>         clang::ento::CoreEngine::ExecuteWorkList
>>         (this=0x7fffffffa848, L=0xf9b16e0, Steps=224975,
>>         InitState=...) at
>>         ../../clang/lib/StaticAnalyzer/Core/CoreEngine.cpp:147
>>         #15 0x000000000734dc74 in
>>         clang::ento::ExprEngine::ExecuteWorkList
>>         (this=0x7fffffffa828, L=0xf9b16e0, Steps=225000) at
>>         ../../clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h:192
>>         _______________________________________________
>>         cfe-dev mailing list
>>         cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>>         https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>         <https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
>>
>>
>>     _______________________________________________
>>     cfe-dev mailing list
>>     cfe-dev at lists.llvm.org  <mailto:cfe-dev at lists.llvm.org>
>>     https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev  <https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20201215/e7e943cf/attachment-0001.html>


More information about the cfe-dev mailing list