[cfe-dev] [analyzer] Pointer cast representation problems
Artem Dergachev via cfe-dev
cfe-dev at lists.llvm.org
Tue Dec 15 13:17:27 PST 2020
Np!
A few self-corrections below.
On 12/14/20 6:23 AM, Vince Bridgers wrote:
> Hi Artem, thank you for taking the time to explain the background how
> to chase this down. I'll carefully read through the materials and
> references provided and continue to build up knowledge on how to work
> on this part of the static analyzer.
>
> Best - Vince
>
> On Sun, Dec 13, 2020 at 11:55 PM Artem Dergachev <noqnoqneo at gmail.com
> <mailto:noqnoqneo at gmail.com>> wrote:
>
> Ok, first of all, the example false positive you're debugging is
> not an example of the bug you're trying to look into, but an
> example of a completely different long-standing bug:
> https://bugs.llvm.org/show_bug.cgi?id=44114
> <https://bugs.llvm.org/show_bug.cgi?id=44114>. Both bugzilla
> entries contain a large number of examples to choose from. None of
> those are beginner bugs though, the reason why they're not fixed
> is because they require a relatively large amount of work and a
> relatively good understanding of the subject.
>
> For your specific example, no, you didn't identify the buggy
> transition correctly, so you're debugging the part that has
> nothing to do with the false positive. I strongly recommend
> finding the buggy transition in your ExplodedGraph dump *before*
> jumping into the debugger. This usually goes from bottom to top,
> exactly like you're expected to read a static analyzer report. Let
> me explain this procedure on your example.
>
> I attached a screenshot of the tail of the ExplodedGraph dump for
> your code (i hope this works fine with the mailing list).
>
Narrator: "It didn't". For future generations: it was
https://i.imgur.com/Q0ErgPe.png
> That's just three last nodes of the graph but they're enough to
> identify the problem.
>
> Node 53 is the bug node, as stated in red; this is the node that
> was produced by generateErrorNode() and fed into the BugReport
> object. The bug report says "Branch condition evaluates to a
> garbage value". The AST expression p[1] (which acts as the branch
> condition which we can confirm by looking at the source code) has
> symbolic value "Undefined". Therefore we can conclude that the bug
> report was emitted correctly and the our problem is not in the
> checker callback that emitted the bug report; it was simply fed an
> incorrect State 1337. Since the only reason why the bug report was
> emitted was that the value was Undefined, we have to find a node
> in which "Undefined" was written into the state.
>
> Node 52 doesn't update the state, so we can skip it. In fact it's
> nicely grouped with node 51 so we barely even notice it.
>
> Node 51 is the nearest node on which the AST expression p[1] was
> first evaluated into Undefined. Whatever code was responsible for
> evaluating node 51 is directly responsible for assigning Undefined
> value to the expression - which ultimately led to the false
> positive in node 53. It is easy to see what was supposed to happen
> at this node: this is a memory load (which is also known in C and
> C++ formalism as "implicit lvalue-to-rvalue conversion" - the
> operation that converts object-as-in-"location" into
> object-as-in-"data"). The operation does not change contents of
> memory, it only reads the existing memory.
>
> Static analyzer represents the knowledge of existing memory in a
> data structure known as the Store, aka "Region Store".
> Straightforwardly, it maps "memory regions" (symbolic
> representations of locations) into arbitrary SVals (symbolic
> representations of data). No, Balazs, it doesn't map "values to
> expressions" =/ The Store is written down in the dump and labeled
> accordingly. In our case there are three entries in the Store:
>
> (1) Variable "myvar" has 16-bit unsigned integer 4386 written into
> it at byte offset 0;
> (2) Variable "p" has a pointer to the first char (hence index 0)
> of variable "myvar" written into it at byte offset 0.
> (3) Variable "x" has 32-bit signed integer 1 written into it at
> byte offset 0.
>
> The location we're loading from is already evaluated: it's the
> value of AST expression p[1] before the load. The Expressions
> section (aka the Environment) contains two entries for p[1]
> because implicit lvalue-to-rvalue casts do not have any visual
> representation in C, however by matching statement identifiers
> (S780 and S784) with the respective program point dumps (S780:
> ArraySubscriptExpr, S784: ImplicitCastExpr (LValueToRValue)) you
> can easily see that S780 is the sub-expression of the load and
> S784 is the load itself. Therefore the location which we're
> loading from is the value of AST expression p[1], namely the
> *second* char of variable "myvar".
>
> Knowing these facts about the memory of our program, we need to
> decide whether the load was performed correctly. If the load was
> performed incorrectly, our job is done: we've found the root
> cause. If the load was performed correctly that'd have meant that
> the state is incorrect and we have to go further up in order to
> find it.
>
> It's fairly obvious that the load was not performed correctly.
> Looking at Store entry (1), it is easy to see that the second
> character of variable "myvar" has value 17 rather than "Undefined".
>
> Therefore our problem consists entirely in incorrect construction
> of node 51. Now's the good time to jump into the debugger, set
> conditional breakpoint on evalLoad with condition "predecessor
> node has identifier 50" and debug from there.
>
> The pointer cast that you were trying to debug was not at fault.
> We can still discuss it separately but it's not what really causes
> this specific false positive to show up.
>
> This entire debugging procedure is very straightforward and
> requires almost no thinking. You simply match static analyzer's
> simulation of the program's behavior to the actual runtime
> behavior of the program and check whether the former is a correct
> abstraction over the latter. Static analyzer is just an
> interpreter (cf. "abstract interpretation") so it's very easy to
> reason about the correctness of every transition as long as you
> know the language it's trying to interpret (in this case, C).
> Which is why i made this 2018 conference talk in order to explain it.
>
*2019. The link is https://www.youtube.com/watch?v=g0Mqx1niUi0
The 2018 one is not particularly relevant here.
> The only non-trivial part is knowing the structure of the program
> state (store vs environment, what are regions and symbols and how
> to read them) which is why i did this whole workbook thing on that
> subject in 2016. I wish i did it properly so that all the
> documentation was in one place but for now that's where we are i
> guess.
>
> On 12/7/20 7:40 AM, Balázs Benics via cfe-dev wrote:
>> Ok, you are saying that we should know that `p[1]` points to an
>> object which was initialized. And according to the target's
>> endianness, resolve the DeclRefExpr to the appropriate constant
>> value.
>>
>> The dumps are probably from the line `char *p = &myvar;` if I'm
>> right.
>> That is the reason why the right-hand side is evaluated to the
>> location of `myvar`, but wrapped into an ElementRegion of type
>> char representing the reinterpret cast.
>> IMO, it's the correct behavior to this point.
>>
>> You should probably dig around the evaluation of the
>> `DeclRefExpr` of the expression `p[1]`. That should return your
>> concrete value.
>> AFAIK, the store maps the values to expressions and the problem
>> probably resides there.
>> But it's just a wide guess :D
>>
>> Unfortunately, that is all I can say about this right now :(
>>
>> Balazs.
>>
>> Vince Bridgers via cfe-dev <cfe-dev at lists.llvm.org
>> <mailto:cfe-dev at lists.llvm.org>> ezt írta (időpont: 2020. dec.
>> 7., H, 11:54):
>>
>> Hi all, I thought I'd look at this problem
>> https://bugs.llvm.org/show_bug.cgi?id=43364
>> <https://bugs.llvm.org/show_bug.cgi?id=43364> (Pointer cast
>> representation problems). maybe use it as an opportunity to
>> dig into the inner workings of the analyzer and maybe even
>> solve a concrete problem :) But it seems I need guidance
>> about possible paths to pursue.
>>
>> Starting with a concrete case, a simple reproducer is below.
>> I debugged this down to
>> SimpleSValBuilder.cpp:evalCastFromLoc(). val is an SVal and
>> castTy is a Loc, and this code path attempts to extract a
>> concrete integer from the SVal (did I get this right?). So I
>> think a solution to this specific case would be to dig into
>> the SVal to see if casted data is concrete, and extract that
>> data. Seems to me this would be the location for that
>> (evalCastFromLoc seems appropropriate enough). If that's
>> true, can someone point me to an example that's similar? I'll
>> keep digging, but thought I'd ask in case this is easy for
>> someone to drop a few helpful hints.
>>
>> I've included some select dumps, state and a bt below in case
>> this is helpful.
>>
>> Best
>>
>> (gdb) p val.dump()
>> &Element{myvar,0 S64b,char}$2 = void
>>
>> (gdb) p castTy.dump()
>> PointerType 0xf931740 'char *'
>> `-BuiltinType 0xf930b50 'char'
>> $3 = void
>>
>>
>> Invocation: clang -cc1 -analyze -analyzer-checker=core case.c
>>
>> struct mystruct {
>> unsigned short _u16;
>> };
>>
>> int main(void) {
>> struct mystruct myvar = { 0x1122 };
>>
>> char *p = &myvar;
>> int x = 0;
>> if (p[0])
>> x+=1;
>> if (p[1]) // Branch condition evaluates to a garbage value
>> [core.uninitialized.Branch]
>> x+=1;
>> return x;
>> }
>>
>> (gdb) p state
>> $4 = {Obj = 0xf9b81b8}
>> (gdb) p state->dump()
>> "program_state": {
>> "store": { "pointer": "0xf9b6fa2", "items": [
>> { "cluster": "myvar", "pointer": "0xf9b2250", "items": [
>> { "kind": "Direct", "offset": 0, "value": "4386 U16b" }
>> ]},
>> { "cluster": "p", "pointer": "0xf9b2aa8", "items": [
>> { "kind": "Direct", "offset": 0, "value":
>> "&Element{myvar,0 S64b,char}" }
>> ]},
>> { "cluster": "x", "pointer": "0xf9b6ed0", "items": [
>> { "kind": "Direct", "offset": 0, "value": "1 S32b" }
>> ]}
>> ]},
>> "environment": { "pointer": "0xf9b16e0", "items": [
>> { "lctx_id": 1, "location_context": "#0 Call", "calling":
>> "main", "location": null, "items": [
>> { "stmt_id": 898, "pretty": "p", "value": "&p" }
>> ]}
>> ]},
>> "constraints": null,
>> "dynamic_types": null,
>> "dynamic_casts": null,
>> "constructing_objects": null,
>> "checker_messages": null
>> }$5 = void
>>
>>
>> (gdb) bt
>> #0 (anonymous namespace)::SimpleSValBuilder::evalCastFromLoc
>> (this=0xf9aedf0, val=..., castTy=...) at
>> ../../clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp:136
>> #1 0x0000000007ec6ecb in (anonymous
>> namespace)::SimpleSValBuilder::dispatchCast (this=0xf9aedf0,
>> Val=..., CastTy=...) at
>> ../../clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp:70
>> #2 0x0000000007edd75e in
>> clang::ento::StoreManager::CastRetrievedVal (this=0xf9b02a0,
>> V=..., R=0xf9b2aa8, castTy=...) at
>> ../../clang/lib/StaticAnalyzer/Core/Store.cpp:438
>> #3 0x0000000007ea0480 in (anonymous
>> namespace)::RegionStoreManager::getBinding (this=0xf9b02a0,
>> B=..., L=..., T=...) at
>> ../../clang/lib/StaticAnalyzer/Core/RegionStore.cpp:1511
>> #4 0x0000000007e9d059 in (anonymous
>> namespace)::RegionStoreManager::getBinding (this=0xf9b02a0,
>> S=0xf9b6fa2, L=..., T=...) at
>> ../../clang/lib/StaticAnalyzer/Core/RegionStore.cpp:551
>> #5 0x0000000007753856 in
>> clang::ento::ProgramState::getRawSVal (this=0xf9b81b8,
>> LV=..., T=...) at
>> ../../clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h:785
>> #6 0x0000000007e64d61 in clang::ento::ProgramState::getSVal
>> (this=0xf9b81b8, location=..., T=...) at
>> ../../clang/lib/StaticAnalyzer/Core/ProgramState.cpp:267
>> #7 0x0000000007df0d77 in clang::ento::ExprEngine::evalLoad
>> (this=0x7fffffffa828, Dst=..., NodeEx=0xf985e00,
>> BoundEx=0xf985e00, Pred=0xf9b8230, state=..., location=...,
>> tag=0x0, LoadTy=...)
>> at ../../clang/lib/StaticAnalyzer/Core/ExprEngine.cpp:2982
>> #8 0x0000000007e10402 in clang::ento::ExprEngine::VisitCast
>> (this=0x7fffffffa828, CastE=0xf985e00, Ex=0xf985dc0,
>> Pred=0xf9b8230, Dst=...) at
>> ../../clang/lib/StaticAnalyzer/Core/ExprEngineC.cpp:313
>> #9 0x0000000007de88a8 in clang::ento::ExprEngine::Visit
>> (this=0x7fffffffa828, S=0xf985e00, Pred=0xf9b8230,
>> DstTop=...) at
>> ../../clang/lib/StaticAnalyzer/Core/ExprEngine.cpp:1729
>> #10 0x0000000007de4d9c in
>> clang::ento::ExprEngine::ProcessStmt (this=0x7fffffffa828,
>> currStmt=0xf985e00, Pred=0xf9b8230) at
>> ../../clang/lib/StaticAnalyzer/Core/ExprEngine.cpp:791
>> #11 0x0000000007de4a89 in
>> clang::ento::ExprEngine::processCFGElement
>> (this=0x7fffffffa828, E=..., Pred=0xf9b8230, StmtIdx=1,
>> Ctx=0x7fffffffa338) at
>> ../../clang/lib/StaticAnalyzer/Core/ExprEngine.cpp:637
>> #12 0x0000000007db6fa9 in
>> clang::ento::CoreEngine::HandlePostStmt (this=0x7fffffffa848,
>> B=0xf9a8860, StmtIdx=1, Pred=0xf9b8230) at
>> ../../clang/lib/StaticAnalyzer/Core/CoreEngine.cpp:466
>> #13 0x0000000007db6717 in
>> clang::ento::CoreEngine::dispatchWorkItem
>> (this=0x7fffffffa848, Pred=0xf9b8230, Loc=..., WU=...) at
>> ../../clang/lib/StaticAnalyzer/Core/CoreEngine.cpp:191
>> #14 0x0000000007db6323 in
>> clang::ento::CoreEngine::ExecuteWorkList
>> (this=0x7fffffffa848, L=0xf9b16e0, Steps=224975,
>> InitState=...) at
>> ../../clang/lib/StaticAnalyzer/Core/CoreEngine.cpp:147
>> #15 0x000000000734dc74 in
>> clang::ento::ExprEngine::ExecuteWorkList
>> (this=0x7fffffffa828, L=0xf9b16e0, Steps=225000) at
>> ../../clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h:192
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>> <https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
>>
>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev <https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20201215/e7e943cf/attachment-0001.html>
More information about the cfe-dev
mailing list