[cfe-dev] How to generate constraints on input to reach a specific program statement/location in execution?
Artem Dergachev via cfe-dev
cfe-dev at lists.llvm.org
Mon Oct 24 06:36:47 PDT 2016
Hello James,
While you might prefer a concolic tool like KLEE for this purpose, the
Analyzer's framework should be more or less suitable for such task as
well, even though it was never really developed in that direction.
You've got symbols (SymExpr objects), which are an algebraic notation
for values that are unknown during the analysis, such as user inputs,
returns of external functions, or values that were already set before
the modeling starts.
On atomic symbols (expressions of SymbolData class, as opposed to
results of operations), the analyzer maintains constraints managed by
RangeConstraintManager. For instance, statement "if (x);", where "x" is
a variable of type "int", whose value is unknown and described as symbol
"$x", splits the analysis in two branches that differ only by range
constraints: one has "$x: [0, 0]", another has "$x: [-2147483648, -1] U
[1, 2147483647]".
Additionally, by looking at the symbols, most of the time you can
understand where it's coming from. For example, if there's a global
variable "y", and we're taking its value before assigning, then its
value, which has been there before the analysis has started, is denoted
by symbol "reg_$0<y>" (of class SymbolRegionValue), which refers to "y".
Presence of constraint "reg_$0<y>: [2, 5]" in the program state would
mean that the global variable should have been set to 2 or 3 or 4 or 5
in order to reach the current moment. Similarly, return values of
external functions are represented as SymbolConjured; information of the
call expression is stored within them. It'd also be a bit tricky with
structural symbols and SymbolDerived objects, ask me if you ever get
that far :)
With constraints being tracked, you do not need to observe branch
conditions; i recommend against that. Instead, at any moment, all the
info you may need about branches is in the constraints.
One thing you'd notice is that there's currently no API to retrieve
constraints as-is; instead, constraint managers are designed to be
replace-able, and actual looks of constraints are treated as an
implementation detail. You might want to break this abstraction in some way.
Another thing you'd encounter is that constraints are garbage-collected
during analysis. If a certain symbol is certainly lost from the program
state (no longer stored in any variable etc.), then the analyzer no
longer needs to track constraints imposed on it. You'd need to track
them though. Checkers are notified of garbage collection in
checkDeadSymbols callback. The same callback is also used to find leaks
(eg. memory leaks or file descriptors that are never closed - they're
merely dead symbols).
So i think this might work.
On 10/19/16 11:58 PM, James Huang via cfe-dev wrote:
> Hi,
>
> I'd like to generate the constraints on input to a program that can
> cause the execution of the program to reach a certain
> statement/location. For this purpose, I'm interested in developing a
> checker using the symbolic execution provided by clang's static
> analyzer. Looking through the static analyzer's documentation, my
> initial thought was to retrieve the constraints for each conditional
> branch statement (e.g. if, for, while, switch) along the execution
> path. However, there are a few challenges.
>
> 1) Does the static analyzer keep track of constraints for each
> variables? If so, how to retrieve the constraints for a specific
> variable? The documentation mentions a generic data map associated
> with a program state, but it seems that it is the responsibility of
> the developer of a checker to maintain it.
>
> 2) How to retrieve information on the source location of a statement
> without generating a bug report? We need to know how a statement is
> control dependent on a conditional branch statement. For example,
> whether a statement belongs to the then block or else block of a if
> statement. We find that the IfStmt class has information on the range
> of source locations of its then block and else block, which can be
> used to answer this question. But the source locations seems to be
> available only through a SourceManager class that can be used only
> when generating a bug report?
>
> 3) How to get information on a conditional branch statement before its
> execution? The static analyzer doesn't allow pre-processing call back
> to any control flow transfer statement, but provides a
> checkConditionalBranch call back for conditional branch statement.
> However, it only provides the condition used in the branch statement
> instead of the branch statement itself.
>
> Thanks.
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
More information about the cfe-dev
mailing list