[cfe-dev] How to generate constraints on input to reach a specific program statement/location in execution?

Mon Oct 24 06:36:47 PDT 2016

Hello James,

While you might prefer a concolic tool like KLEE for this purpose, the 
Analyzer's framework should be more or less suitable for such task as 
well, even though it was never really developed in that direction.

You've got symbols (SymExpr objects), which are an algebraic notation 
for values that are unknown during the analysis, such as user inputs, 
returns of external functions, or values that were already set before 
the modeling starts.

On atomic symbols (expressions of SymbolData class, as opposed to 
results of operations), the analyzer maintains constraints managed by 
RangeConstraintManager. For instance, statement "if (x);", where "x" is 
a variable of type "int", whose value is unknown and described as symbol 
"$x", splits the analysis in two branches that differ only by range 
constraints: one has "$x: [0, 0]", another has "$x: [-2147483648, -1] U 
[1, 2147483647]".

Additionally, by looking at the symbols, most of the time you can 
understand where it's coming from. For example, if there's a global 
variable "y", and we're taking its value before assigning, then its 
value, which has been there before the analysis has started, is denoted 
by symbol "reg_$0<y>" (of class SymbolRegionValue), which refers to "y". 
Presence of constraint "reg_$0<y>: [2, 5]" in the program state would 
mean that the global variable should have been set to 2 or 3 or 4 or 5 
in order to reach the current moment. Similarly, return values of 
external functions are represented as SymbolConjured; information of the 
call expression is stored within them. It'd also be a bit tricky with 
structural symbols and SymbolDerived objects, ask me if you ever get 
that far :)

With constraints being tracked, you do not need to observe branch 
conditions; i recommend against that. Instead, at any moment, all the 
info you may need about branches is in the constraints.

One thing you'd notice is that there's currently no API to retrieve 
constraints as-is; instead, constraint managers are designed to be 
replace-able, and actual looks of constraints are treated as an 
implementation detail. You might want to break this abstraction in some way.

Another thing you'd encounter is that constraints are garbage-collected 
during analysis. If a certain symbol is certainly lost from the program 
state (no longer stored in any variable etc.), then the analyzer no 
longer needs to track constraints imposed on it. You'd need to track 
them though. Checkers are notified of garbage collection in 
checkDeadSymbols callback. The same callback is also used to find leaks 
(eg. memory leaks or file descriptors that are never closed - they're 
merely dead symbols).

So i think this might work.

On 10/19/16 11:58 PM, James Huang via cfe-dev wrote:
> Hi,
>
> I'd like to generate the constraints on input to a program that can
> cause the execution of the program to reach a certain
> statement/location. For this purpose, I'm interested in developing a
> checker using the symbolic execution provided by clang's static
> analyzer. Looking through the static analyzer's documentation, my
> initial thought was to retrieve the constraints for each conditional
> branch statement (e.g. if, for, while, switch) along the execution
> path. However, there are a few challenges.
>
> 1) Does the static analyzer keep track of constraints for each
> variables? If so, how to retrieve the constraints for a specific
> variable? The documentation mentions a generic data map associated
> with a program state, but it seems that it is the responsibility of
> the developer of a checker to maintain it.
>
> 2) How to retrieve information on the source location of a statement
> without generating a bug report? We need to know how a statement is
> control dependent on a conditional branch statement. For example,
> whether a statement belongs to the then block or else block of a if
> statement. We find that the IfStmt class has information on the range
> of source locations of its then block and else block, which can be
> used to answer this question. But the source locations seems to be
> available only through a SourceManager class that can be used only
> when generating a bug report?
>
> 3) How to get information on a conditional branch statement before its
> execution? The static analyzer doesn't allow pre-processing call back
> to any control flow transfer statement, but provides a
> checkConditionalBranch call back for conditional branch statement.
> However, it only provides the condition used in the branch statement
> instead of the branch statement itself.
>
> Thanks.
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev