[cfe-dev] Question on inspecting information across program paths

Tue Feb 14 03:50:45 PST 2017

Hi Venugopal,

There is a way to do it. The simplest way, as I guess, is to create a 
chain of ExplodedNodes with your states. The code will look similar to this:

ProgramStateRef InitialState = C.getState();
ProgramStateRef MarkedState = InitialState->set<...>(...);
if (ExplodedNode *MarkedNode = 
C.generate[NonFatalError?]Node(MarkedState)) {
   ExplodedNode *NonMarkedNode = 
C.generate[NonFatalError?]Node(InitialState, MarkedNode))
}

So, the chain will look like this:
<InitialState, InitialNode> => <MarkedState, MarkedNode> => 
<InitialState, NonMarkedNode>

But if you want to just add some mark to a newly-created node, you can 
not use state at all. Instead, you can just create a node with a 
specified tag.

static SimpleProgramPointTag CustomTag("My marked node!");
ExplodedNode *NodeWithSpecialTag = 
C.generate[NonFatalError?]Node(NewState, C.getPredecessor(), &CustomTag);

13.02.2017 15:57, Venugopal Raghavan пишет:
> Hi Aleksei,
>
> Thanks, I have re-written the checker based on your suggestion.
>
> The custom program state that a checker adds to a program state gets 
> propagated to succeeding states also unless they are explicitly 
> removed. Is there a way to write some data into the custom state which 
> is valid only for that state and does not get propagated to succeeding 
> states? I did something like this:
>
> ProgramStateRef state = C.getState();   // C is CheckerContext
> const LocationContext *lctx = C.getLocationContext();
> SVal val = state->getSVal(S, lctx);
> SymbolRef Sym = val.getAsSymbol();  // Symbol I want to be added to 
> this state and to this state alone
>
> // Remove existing custom state data
> const CustomStateTy &Syms = state->get<CustomState>();
>  for (auto I = Syms.begin(), E = Syms.end(); I != E; ++I) {
>      SymbolRef Sym1 = *I;
>     state = state->remove<CustomState>(Sym1);
>  }
> state = state->add<CustomState>(Sym);  // Add the symbol that is 
> needed for the current state
>
> The idea was to remove custom state information from a state and then 
> write the information (symbol) corresponding to that state. Thus I 
> write the symbol to a state S and that creates a new state S1 with 
> that symbol. S itself is not modified since states are immutable. Now, 
> the state which succeeds S1 (which I call S2) also gets the same 
> symbol through state propagation which should however not be 
> associated with that state. Hence I try remove the symbol from S2 but 
> it does not actually remove it from that S2 but creates a new state S3 
> with the symbol removed (due to the immutable nature of states).
>
> Finally, I would still be left with states such as S2 which has 
> information which do not belong to them. Can I fix this in some way?
>
> Not sure if this is a requirement outside the support provided by the 
> analyzer or I have created a requirement which is unnatural.
>
> Thanks.
>
> Regards,
> Venu.
>
> On Thu, Feb 9, 2017 at 8:54 PM, Aleksei Sidorin <a.sidorin at samsung.com 
> <mailto:a.sidorin at samsung.com>> wrote:
>
>     Hello Venugopal,
>
>     Unfortunately, there is no a simple way to get "symbol diff". Your
>     approach with sequential iteration seems right to me in general.
>     About comparison of symbols. If you want just ensure that some
>     symbols are same or different (like reg_$0<x> and reg_$1<y> which
>     are different), you can just compare their pointers (SymbolRefs).
>     Symbols are not modifiable, every symbol is allocated in the pool
>     and lives here, SymbolRefs just point on it. If these pointers are
>     different, symbols are different too, and equal otherwise.
>
>
>     09.02.2017 18:20, Aleksei Sidorin пишет:
>>     Hello Venugopal,
>>
>>     If you want just ensure that some symbols are same (like
>>     reg_$0<x> and reg_$1<y>), you can just compare their pointers
>>     (SymbolRefs). Symbols are not modifiable, every symbol is
>>     allocated in the pool and lives here, SymbolRefs just point on
>>     it. If these pointers are different, symbols are different too,
>>     and equal otherwise.
>>
>>
>>     09.02.2017 13:03, Venugopal Raghavan пишет:
>>>     Hi,
>>>
>>>     Apologies, but, I am back again.
>>>
>>>     I have two SymbolRef pointers (SymExpr *). One SymbolRef
>>>     is (reg_$0<x>) + 3$117. The other SymbolRef is (reg_$0<x>) +
>>>     2$130. x is a variable in my program and these SymbolRefs
>>>     respresent an assignment of x + y to a variable "a". y has the
>>>     value 3 along one path to the program point where "a" is
>>>     assigned and it has the value 2 along a second path to the same
>>>     point, hence the two SymbolRefs differ in the constants.
>>>
>>>     I want to compare these two SymbolRefs and find out where they
>>>     differ. In this case, I want to be able to say that they differ
>>>     in the constant value. I thought of using symbol_iterator to
>>>     iterate over each SymbolRef or SymExpr * in lock-step fashion
>>>     and find the first point where it differs. But these is no
>>>     operator== defined in the SymExpr class and hence I am unable to
>>>     check the equality or otherwise of two symbols in the SymbolRefs.
>>>
>>>     Is there a way to compare two SymbolRefs and check for the
>>>     differences? Otherwise, is there a better way to check whether
>>>     the symbolic value at the same program point but in two
>>>     different program states are the same or not, and if they are
>>>     different, is there a way to identify the difference? Maybe, my
>>>     approach of comparing two symbolic expressions at a program
>>>     point is not the right approach in the first place.
>>>
>>>     Thanks.
>>>
>>>     Regards,
>>>     Venu.
>>>
>>>     On Mon, Feb 6, 2017 at 2:42 PM, Venugopal Raghavan
>>>     <venur2005 at gmail.com <mailto:venur2005 at gmail.com>> wrote:
>>>
>>>         Hi Aleksei,
>>>
>>>         Thanks. That seems to work.
>>>
>>>         Regards,
>>>         Venu.
>>>
>>>         On Mon, Feb 6, 2017 at 2:05 PM, Aleksei Sidorin
>>>         <a.sidorin at samsung.com <mailto:a.sidorin at samsung.com>> wrote:
>>>
>>>             Hello,
>>>
>>>             On the point where PreStmt callbacks are called, the
>>>             expression value is still not computed so the result of
>>>             getSVal() is always UnknownVal and not a symbol. You
>>>             should probably use PostStmt callback instead.
>>>
>>>
>>>             06.02.2017 08:21, Venugopal Raghavan пишет:
>>>>             Hi Aleksei,
>>>>
>>>>             Thanks for the reply. Unfortunately, I am having
>>>>             further trouble with my checker implementation. I have
>>>>             the following code in my checker (called ConstantChecker):
>>>>
>>>>              void ConstantChecker::checkPreStmt(const Stmt *S,
>>>>             CheckerContext &C) const {
>>>>             ProgramStateRef state = C.getState();
>>>>                     const LocationContext *lctx =
>>>>             C.getLocationContext();
>>>>                     SVal val = state->getSVal(S, lctx);
>>>>                     SymbolRef Sym = val.getAsSymbol();
>>>>                     // Stmt 1
>>>>               .
>>>>               .
>>>>               .
>>>>             }
>>>>             My input test case for the checker is test1.c and I run
>>>>             the command clang -cc1 -analyze
>>>>             -analyzer-checker=optin.performance.PathSpecificConstants
>>>>             test1.c where PathSpecificConstants is the name of my
>>>>             checker:
>>>>
>>>>             test(int x) {
>>>>                int y;
>>>>                int a;
>>>>                if (x > 0) {
>>>>                  y = 2;
>>>>                }
>>>>                else {
>>>>                  y = 3;
>>>>                }
>>>>                a = x + y;
>>>>                printf("a = %d\n", a);
>>>>             }
>>>>
>>>>             When I print "Sym" in gdb at the break point Stmt 1, I
>>>>             always get Sym as 0x0. This probably means that val is
>>>>             never a symbol for my test case and hence
>>>>             val.getAsSymbol() fails. I do not understand this. I
>>>>             would have thought that at least the expression x + y
>>>>             would be represented by a symbolic value such as $0. I
>>>>             tried doing const MemRegion *mem = val.getAsRegion()
>>>>             also at that  point and tried to print out mem in gdb.
>>>>             That also has a value 0x0 except for the printf
>>>>             statement. I cannot understand this too. I would have
>>>>             expected variables such as "x", "y" and "a" to be
>>>>             associated with a memory region and hence mem to have a
>>>>             non-null value at least at the points where y and a are
>>>>             being assigned to.
>>>>
>>>>             My plan was to use Sym as the index into a custom map
>>>>             that I have created where I am storing the values of
>>>>             constants seen in the test case. I am unable to get
>>>>             this plan working since Sym is always 0x0.
>>>>
>>>>             I have gone through the Checker Developer Manual. I am
>>>>             sorry to be asking specific questions about my code in
>>>>             this fashion, but, unfortunately, I am stuck at this point.
>>>>
>>>>             Thanks.
>>>>
>>>>             Regards,
>>>>             Venugopal Raghavan.
>>>>
>>>>             On Wed, Feb 1, 2017 at 11:13 PM, Aleksei Sidorin
>>>>             <a.sidorin at samsung.com <mailto:a.sidorin at samsung.com>>
>>>>             wrote:
>>>>
>>>>                 Hello Venugopal,
>>>>
>>>>                 You will need to extract all the information from
>>>>                 ExplodedNodes.
>>>>                 For example, state is a part of ExplodedNode. So,
>>>>                 you can iterate over ExplodedNodes of graph using
>>>>                 smth like:
>>>>                 for (auto Iter = G.nodes_begin(), E =
>>>>                 G.nodes_end(); I != E; ++I) {
>>>>                 ProgramStateRef State = I->getState();
>>>>                   // Do the stuff with State
>>>>                 }
>>>>
>>>>
>>>>
>>>>
>>>>                 01.02.2017 12:19, Venugopal Raghavan пишет:
>>>>>                 Hi,
>>>>>
>>>>>                 I am registering my own map in the program state
>>>>>                 using the the following:
>>>>>
>>>>>                 REGISTER_MAP_WITH_PROGRAMSTATE(ConstantMap,
>>>>>                 SymbolRef, VariableConstVal)
>>>>>
>>>>>                 where VariableConstVal is the structure containing
>>>>>                 the information I want in this map.
>>>>>
>>>>>                 I am populating this map in the
>>>>>                 callback checkPostStmt() using state =
>>>>>                 state->set<ConstantMap>(sym, ...) while the state
>>>>>                 graph is being constructed.
>>>>>
>>>>>                 At the end of analysis, I want to now read the
>>>>>                 information stored in this map in the
>>>>>                 callback checkEndAnalysis(ExplodedGraph &G,
>>>>>                 BugReporter &B, ExprEngine &Eng). However, I am
>>>>>                 not sure how to get the SymbolRef associated with
>>>>>                 an ExplodedNode which is what I get when I iterate
>>>>>                 over the ExplodedGraph at the end of the analysis.
>>>>>
>>>>>                 How can I get the SymbolRef that I can use to
>>>>>                 index into my map? In checkPostStmt(stmt* S,
>>>>>                 CheckerContext &C), I used the following code to
>>>>>                 get the SymbolRef that I can use to store the
>>>>>                 information in the map:
>>>>>
>>>>>                   SVal val = state->getSVal(S, lctx);
>>>>>                 SymbolRef Sym = val.getAsSymbol();
>>>>>
>>>>>                 but in checkEndAnalysis(ExplodedGraph &G,
>>>>>                 BugReporter &B, ExprEngine &Eng) I do not seem to
>>>>>                 have a handle to stmt* to get the SVal.
>>>>>
>>>>>                 I guess my question is probably not phrased very
>>>>>                 well, but I hope some one can make some sense out
>>>>>                 of it. Also, I guess I have a long way to go
>>>>>                 before I understand the data structures in the
>>>>>                 analyzer well. I checked the other checkers, but
>>>>>                 none of them seem to exhibit exactly my requirement.
>>>>>
>>>>>                 Thanks.
>>>>>
>>>>>                 Regards,
>>>>>                 Venugopal Raghavan.
>>>>>
>>>>>                 On Wed, Jan 25, 2017 at 3:19 PM, Venugopal
>>>>>                 Raghavan <venur2005 at gmail.com
>>>>>                 <mailto:venur2005 at gmail.com>> wrote:
>>>>>
>>>>>                     Hi Aleksei,
>>>>>
>>>>>                     Thanks. I will check this.
>>>>>
>>>>>                     Regards,
>>>>>                     Venu.
>>>>>
>>>>>                     On Wed, Jan 25, 2017 at 3:15 PM, Aleksei
>>>>>                     Sidorin <a.sidorin at samsung.com
>>>>>                     <mailto:a.sidorin at samsung.com>> wrote:
>>>>>
>>>>>                         Hello Venugopal,
>>>>>
>>>>>                         During analysis, you have access to a
>>>>>                         single path only. But after the analysis
>>>>>                         is done, you can summarize information
>>>>>                         across different paths using
>>>>>                         checkEndAnalysis() callback. You will get
>>>>>                         full ExplodedGraph built for the function
>>>>>                         so you will be able to look into its every
>>>>>                         node.
>>>>>
>>>>>
>>>>>                         25.01.2017 09:25, Venugopal Raghavan via
>>>>>                         cfe-dev пишет:
>>>>>>                         Hi,
>>>>>>
>>>>>>                         Firstly, apologies for the long email below.
>>>>>>
>>>>>>                         I am new to the Clang Static Analyzer
>>>>>>                         (CLA). My interest in CLA was more to see
>>>>>>                         if I can solve quickly path-sensitive
>>>>>>                         data-flow analysis problems than as a
>>>>>>                         vehicle to build bug-finding checkers. I
>>>>>>                         needed the solutions to such data flow
>>>>>>                         problems to extract some behavioral
>>>>>>                         properties from some test cases I have.
>>>>>>                         For example, consider the following piece
>>>>>>                         of code:
>>>>>>
>>>>>>                         if (cond) {
>>>>>>                            x = 4;
>>>>>>                         }
>>>>>>                         else {
>>>>>>                            x = 5;
>>>>>>                         }
>>>>>>
>>>>>>                         L: .... = ...x..; // Use of variable x
>>>>>>
>>>>>>                         I want to identify program locations such
>>>>>>                         as "L" in the code above where x is not a
>>>>>>                         constant if you aggregate data-flow
>>>>>>                         information across paths, but, on the
>>>>>>                         other hand, is actually a constant if you
>>>>>>                         have path-specific data flow information.
>>>>>>                         Since CFA does path-specific analysis, I
>>>>>>                         was curious to know if it would help me
>>>>>>                         with such tasks. The "bug-report" I want
>>>>>>                         at location L is that x has a value 4,
>>>>>>                         given the path constraint for the path
>>>>>>                         including the "then-path" of the if
>>>>>>                         statement and that x has a value 5 along
>>>>>>                         the else-path.
>>>>>>
>>>>>>                         I started writing a checker to accomplish
>>>>>>                         the above task, but was quickly blocked
>>>>>>                         by some basic doubts. My rough plan for
>>>>>>                         the checker was to maintain a map in
>>>>>>                         ProgramState which maps ProgramPoints
>>>>>>                         (or, maybe program symbols) to constants
>>>>>>                         when those variables indeed have constant
>>>>>>                         values at the ProgramPoint. For example,
>>>>>>                         when CFA expands states along the
>>>>>>                         then-path and reaches "L", my map would
>>>>>>                         say that at ProgramPoint "L", variable x
>>>>>>                         has a constant value 4. Then, when CFA
>>>>>>                         expands nodes along the else-path, I
>>>>>>                         guess it would again create a state
>>>>>>                         corresponding to L which now says that x
>>>>>>                         has the constant value 5. I want to be
>>>>>>                         able to catch this scenario where the
>>>>>>                         same variable at the same ProgramPoint
>>>>>>                         has two different constant values
>>>>>>                         depending on the path taken.
>>>>>>
>>>>>>                         However, the issue is that, when the
>>>>>>                         state graph along the else-path is
>>>>>>                         expanded, it would no longer have any of
>>>>>>                         the map contents that was built along the
>>>>>>                         disjoint then-path. How then can I get
>>>>>>                         access to the then-path information when
>>>>>>                         I am expanding the path along the
>>>>>>                         else-path? Since the checker is also
>>>>>>                         required to be stateless, I cannot
>>>>>>                         maintain auxiliary information in the
>>>>>>                         checker that is persistent across
>>>>>>                         multiple calls to the call-back function.
>>>>>>
>>>>>>                         Can you help me with this question?
>>>>>>                         Maybe, I am trying to use CFA in ways it
>>>>>>                         was not meant to be used viz. as a
>>>>>>                         vehicle to solve data flow problems which
>>>>>>                         consults information cutting across
>>>>>>                         paths. Maybe, the CFA is meant to handle
>>>>>>                         problems which require the examination of
>>>>>>                         information restricted to single paths.
>>>>>>                         Is that the case? Or, am I missing something?
>>>>>>
>>>>>>                         Thanks.
>>>>>>
>>>>>>                         Regards,
>>>>>>                         Venugopal Raghavan.
>>>>>>
>>>>>>
>>>>>>                         _______________________________________________
>>>>>>                         cfe-dev mailing list
>>>>>>                         cfe-dev at lists.llvm.org
>>>>>>                         <mailto:cfe-dev at lists.llvm.org>
>>>>>>                         http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>>                         <http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
>>>>>
>>>>>                         -- 
>>>>>                         Best regards,
>>>>>                         Aleksei Sidorin,
>>>>>                         SRR, Samsung Electronics
>>>>>
>>>>                 -- 
>>>>                 Best regards,
>>>>                 Aleksei Sidorin,
>>>>                 SRR, Samsung Electronics
>>>>
>>>             -- 
>>>             Best regards,
>>>             Aleksei Sidorin,
>>>             SRR, Samsung Electronics
>>>
>>     -- 
>>     Best regards,
>>     Aleksei Sidorin,
>>     SRR, Samsung Electronics
>
>     -- 
>     Best regards,
>     Aleksei Sidorin,
>     SRR, Samsung Electronics
>
-- 
Best regards,
Aleksei Sidorin,
SRR, Samsung Electronics
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20170214/a5531441/attachment.html>