<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">Hi Gabor,<br>
<br>
Thank you for the reply! You can find my answers inline.<br>
<br>
17.05.2018 15:28, Gábor Horváth пишет:<br>
</div>
<blockquote type="cite"
cite="mid:CAPRL4a0WncdqW394JFDZ7SdA0+bF0jgFrH7-wnfsN03mFADH9g@mail.gmail.com">
<div dir="ltr">
<div>Hi Alexey,</div>
<div><br>
</div>
<div>In general, I like the idea of having a more declarative
way to define new</div>
<div>checks.<br>
</div>
<div><br>
</div>
<div>
<div class="gmail_quote">
<div dir="ltr">On Thu, 17 May 2018 at 01:37, Alexey Sidorin
<<a href="mailto:alexey.v.sidorin@ya.ru"
moz-do-not-send="true">alexey.v.sidorin@ya.ru</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">Hello
everyone,<br>
<br>
I'd like to share some results of my investigation
targeted on <br>
improvement of Clang Static Analyzer checker API. You can
find some <br>
previous conversation on this topic here: <br>
<a
href="http://clang-developers.42468.n3.nabble.com/analyzer-RFC-Design-idea-separate-modelling-from-checking-td4059122.html"
rel="noreferrer" target="_blank" moz-do-not-send="true">http://clang-developers.42468.n3.nabble.com/analyzer-RFC-Design-idea-separate-modelling-from-checking-td4059122.html</a>.
<br>
In my investigation, I tried to solve a particular problem
of building a <br>
checker without generating new nodes.<br>
</blockquote>
<div><br>
</div>
<div>Why do you focus on such checks that does not have
traits, does not generate new nodes. <br>
</div>
<div>Do you find this to be the majority of the checks you
need to implement? <br>
</div>
</div>
</div>
</div>
</blockquote>
Allowing checkers to generate new nodes and sinks leads to coverage
issues: the coverage changes if we disable or enable checkers. The
ability to modify state (Environment and Store) also makes us think
about how checkers will interact with each other.<br>
Regarding traits - path matchers are not completely trait-less. They
just use other GDM and can only store DynTypedNodes (extended by
path-sensitive nodes). The bindings that matchers have added while
matching separate graph nodes are propagated across ExplodedGraph.
Actually, path bindings act very similar to checker traits, and they
offer a subset of GDM abilities.<br>
<br>
<blockquote type="cite"
cite="mid:CAPRL4a0WncdqW394JFDZ7SdA0+bF0jgFrH7-wnfsN03mFADH9g@mail.gmail.com">
<div dir="ltr">
<div>
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
--------- Introduction and design goals ---------<br>
<br>
In brief, I tried to use matchers-like API to make CSA
checkers look <br>
like this:<br>
<br>
StatementMatcher NotChdir =<br>
callExpr(unless(callee(functionDecl(hasName("::chdir")))));<br>
Finder.addMatcher(<br>
hasSequence(<br>
postStmt(hasStatement(<br>
callExpr(callee(functionDecl(hasName("::chroot")))))),<br>
unless(stmtPoint(hasStatement(callExpr(<br>
callee(functionDecl(hasName("::chdir"))),<br>
hasArgument(0,
hasValue(stringRegion(refersString("/")))))))),<br>
explodedNode(anyOf(postStmt(hasStatement(NotChdir)),<br>
callEnter(hasCallExpr(NotChdir))))<br>
.bind("bug_node")),<br>
&Callback);<br>
Finder.match(G);<br>
</blockquote>
<div><br>
</div>
<div>I think, a common (design) pitfall of writing checks is
to try to match against</div>
<div>the AST when a symbolic execution callback should be
used instead.</div>
<div>I am a bit afraid having an API like this would make
this pitfall more common.</div>
<div>Maybe a separation between the path sensitive, flow
sensitive and</div>
<div>AST based checks is good for the checker writers and
new users and</div>
<div>I am not sure that the same abstraction fits all of
these cases.</div>
<div><br>
</div>
<div>In case of path sensitive checks I wonder if a state
machine like abstraction would</div>
<div>fit the use cases better (and also cover checks that
are using traits)</div>
<div>where the guarding conditions of the state transitions
can include AST matcher like checks. <br>
</div>
</div>
</div>
</div>
</blockquote>
In my experience, nearly 2/3 path-sensitive checkers we wrote follow
state machine abstraction. But it is possible to implement other
matcher operator that will use another matching strategy
(grammar-like, for example), and declare a path matcher based on it.<br>
<br>
<blockquote type="cite"
cite="mid:CAPRL4a0WncdqW394JFDZ7SdA0+bF0jgFrH7-wnfsN03mFADH9g@mail.gmail.com">
<div dir="ltr">
<div>
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
and I have managed to make some simple working examples.<br>
<br>
The entire diff can be found here: <br>
<a
href="https://github.com/a-sid/clang/commit/9a0b1d1d9b3cf41b551a663f041f54d5427aa72f"
rel="noreferrer" target="_blank" moz-do-not-send="true">https://github.com/a-sid/clang/commit/9a0b1d1d9b3cf41b551a663f041f54d5427aa72f</a><br>
The code itself: <a
href="https://github.com/a-sid/clang/tree/a4"
rel="noreferrer" target="_blank" moz-do-not-send="true">https://github.com/a-sid/clang/tree/a4</a><br>
<br>
There are several reasons why I have tried this approach.<br>
<br>
1. AST Matchers are already extensively used API for AST
checking. They <br>
are available both in Clang-Tidy and CSA. And I wanted to
use existing <br>
functionality as much as possible. So, I decided to extend
an existing <br>
API to make its usage seamless across different kinds of
checks: <br>
path-sensitive, AST-based and CFG-based.<br>
</blockquote>
<div><br>
</div>
<div>I am not sure if this is a good idea. I think the
checker author supposed to</div>
<div>be aware if the check is AST based, flow sensitive, or
path sensitive. <br>
</div>
<div>And this should also be easy to tell by reading the
code. I am also not</div>
<div>sure whether there is an abstraction that fits all.</div>
</div>
</div>
</div>
</blockquote>
Yes, there is never ideal solution. But we can just use descriptive
matcher names to notify users that they are going to use
path-sensitive analysis. Even in this small PoC, path matchers are
placed into a separate header and into a separate namespace (and are
even part of clangStaticAnalyzer, not clangASTMatchers), so one
should really understand what he does to make path matchers work.<br>
Also, we can often meet some hand-written AST matching inside
checker callbacks. We can just make it more consise.<br>
<br>
<blockquote type="cite"
cite="mid:CAPRL4a0WncdqW394JFDZ7SdA0+bF0jgFrH7-wnfsN03mFADH9g@mail.gmail.com">
<div dir="ltr">
<div>
<div class="gmail_quote">
<div><br>
</div>
<div>I think the reason why this idea works well for checks
that only inspect the</div>
<div>exploded graph, because in this case we are still doing
pattern matching on</div>
<div>graphs in a similar manner as doing pattern matching on
the AST.</div>
<div><br>
</div>
<div>But does this generalize later on to stateful checks? <br>
</div>
</div>
</div>
</div>
</blockquote>
I'm not sure I understood you correctly. If you mean that stateful
checkers are checkers that need to modify state, then I think we
need to put state modification out of checkers if possible to get
reproducible coverage independently on checkers enabled or disabled.<br>
<br>
<blockquote type="cite"
cite="mid:CAPRL4a0WncdqW394JFDZ7SdA0+bF0jgFrH7-wnfsN03mFADH9g@mail.gmail.com">
<div dir="ltr">
<div>
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
2. AST matchers effectively help clients to avoid a lot of
checking of <br>
dyn_cast results. This feature not only makes them more
convenient but <br>
also more safe because, in my experience, forgetting a
nullptr/None <br>
check is a pretty common mistake for checker writers.<br>
</blockquote>
<div><br>
</div>
<div>
<div>I think if something cannot be null we should return
references, otherwise</div>
<div>the client need to check for the pointer beeing null
(unless there are some</div>
<div>dynamic invariant that would make the check
redundant). If an API does</div>
<div>not match this philosophy, maybe we should fix the
API first. </div>
<div><br>
</div>
<div>Regardless of fixing the API, I agree that it would
be great to have higher</div>
<div>level APIs for checker authors.</div>
</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
3. Testing of AST matchers don't require writing C++ code
- it can be <br>
done interactively with clang-query tool. And I believe
that we need <br>
similar functionality for CSA as well.<br>
</blockquote>
<div><br>
</div>
Having a query tool for exploded graphs could be very
helpful, I agree.</div>
<div class="gmail_quote">These graphs tend to be very large
and sometimes it is not trivial to</div>
<div class="gmail_quote">find the relevant parts during
debugging.<br>
</div>
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
I didn't want to replace existing checker API. Instead, I
tried to make <br>
new possibilities usable independently or together with
existing.<br>
<br>
--------- Design and implementation ---------<br>
<br>
The implementation consisted of a number of steps.<br>
<br>
1. Allow matching nodes of path-sensitive graph like usual
AST nodes. To <br>
make this possible, three actions were needed:<br>
1.1. ASTTypeTraits and DynTypedNode were extended to
support <br>
path-sensitive nodes: ExplodedNode, ProgramState, SVal,
SymExpr, etc. <br>
</blockquote>
<div><br>
</div>
<div>I wonder, if in general it is a good idea to pattern
match in checks on SymExpr.</div>
<div>A more general approach here would be to use the
constraint solver for queries</div>
<div>in a similar manner how ProgramState::assume works.<br>
</div>
</div>
</div>
</div>
</blockquote>
It is not only possible, but is already implemented - partially, of
course. You can take a look at canBeZero() matcher in
TestAfterDivZeroV2.<br>
<br>
<blockquote type="cite"
cite="mid:CAPRL4a0WncdqW394JFDZ7SdA0+bF0jgFrH7-wnfsN03mFADH9g@mail.gmail.com">
<div dir="ltr">
<div>
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
The implementation with graph node support is moved into a
separate <br>
class (ASTGraphTypeTraits) in ento namespace to resolve
cyclic <br>
dependencies (they are still exist, unfortunately, but are
header-only, <br>
so we can build the PoC).<br>
1.2. Some additions to AST matchers were made to add
support for new <br>
kinds of nodes.<br>
1.3. To make MatchFinder able to query specific options
not supported <br>
by pure AST, it was augmented with "Contexts". A matcher
that needs to <br>
query the path-sensitive engine asks the Finder for the
required Context <br>
which provides specific helper functions.<br>
<br>
As the result of this step, we are able to write matchers
like <br>
expr(hasArgument(0,
hasValue(stringRegion(refersString("/"))))).<br>
</blockquote>
<div><br>
</div>
<div>I think this is the part of the proposal that I like
the most, it would be</div>
<div>a very concise way to write down guarding conditions.<br>
</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
2. Create an engine for graph exploration and matching.<br>
Unlike normal AST matchers, hasSequence is a
path-sensitive matcher. <br>
It is launched against ExplodedGraph. These matchers are
handled by <br>
GraphMatchFinder object. While searching a graph, it
collects matches. <br>
Each match contains a pointer to the corresponding matcher
and State ID <br>
of this match. The way how matches are translated from one
state ID to <br>
another is determined by matcher operators.<br>
</blockquote>
<div><br>
</div>
<div>Is this matching done after the exploded graph already
built? If so,</div>
<div>these checks will be unable to generate sink nodes. I
think having sink</div>
<div>nodes sometimes desirable even if the check itself does
not have a trait.</div>
</div>
</div>
</div>
</blockquote>
In the implemented checkers - yes, the matching is done on the
ExplodedGraph already built. But nothing prevents GraphMatchFinder
to be used as a part of CheckerManager or ExprEngine. It has a
method called advance() that performs a single transition to the new
node not processed before, so it can be fed with single nodes that
engine creates during graph building. The matching on-the-fly is
possible, just was not implemented.<br>
<br>
<blockquote type="cite"
cite="mid:CAPRL4a0WncdqW394JFDZ7SdA0+bF0jgFrH7-wnfsN03mFADH9g@mail.gmail.com">
<div dir="ltr">
<div>
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
For example, SequenceVariadicOperator, which is the
base of <br>
hasSequence() matcher, has "positive" and "negative"
sub-matches. Each <br>
positive matcher has its corresponding State ID. In order
to advance to <br>
the next State ID, a node being matched should match all
"negative" <br>
matchers before the next "positive" matchers and the next
"positive" <br>
matcher itself. Failure in matching "negative" matcher
discards the match.<br>
<br>
The role of GraphMatchFinder is similar to MatchFinder:
it is only <br>
responsible for graph exploration and keeping bound nodes
and matchers.<br>
<br>
3. Manage bound nodes.<br>
While matching a single graph node, BoundNodes from AST
MatchFinder <br>
are used. AST matchers for path-sensitive nodes support
bindings <br>
out-of-the-box. However, the situation with graph matching
is a bit <br>
different. For graph matching, we have another system of
bound nodes. <br>
Each graph node has a related map of bounds aka GDMTy
(yes, the name is <br>
not a coincidence). GDMTy is a mapping from match ID to
BoundNodesMap <br>
which, in part, is effectively a map from std::string to
DynTypedNodes. <br>
This look pretty much like how GDM is organized in
ExplodedGraph, just <br>
without one level of indirection (it can be added,
though).<br>
<br>
MatchFinder contexts should allow support of their own
bindings. This <br>
is how equalsBoundNode() matcher works for path-sensitive
nodes: it just <br>
queries all available contexts for the binding.<br>
<br>
Finally, I have managed to make two checkers work in this
way: <br>
ChrootChecker (which is present in the introduction) and <br>
TestAfterDivZeroChecker. Both them can be found in
ChrootCheckerV2.cpp <br>
and TestAfterDivZeroCheckerV2.cpp correspondingly. They
act like normal <br>
checkers: produce warnings and use visitors. The main
difference is that <br>
they cannot generate sinks and use checkEndAnalysis
callback. The code <br>
of these checkers can be found here:<br>
<br>
<a
href="https://github.com/a-sid/clang/blob/a4/lib/StaticAnalyzer/Checkers/ChrootCheckerV2.cpp"
rel="noreferrer" target="_blank" moz-do-not-send="true">https://github.com/a-sid/clang/blob/a4/lib/StaticAnalyzer/Checkers/ChrootCheckerV2.cpp</a><br>
<a
href="https://github.com/a-sid/clang/blob/a4/lib/StaticAnalyzer/Checkers/TestAfterDivZeroCheckerV2.cpp"
rel="noreferrer" target="_blank" moz-do-not-send="true">https://github.com/a-sid/clang/blob/a4/lib/StaticAnalyzer/Checkers/TestAfterDivZeroCheckerV2.cpp</a><br>
<br>
<br>
-------- Some features --------<br>
<br>
1.The design of bindings has an interesting consequence:
it enables the <br>
dynamic introspection of GDM which was pretty hard before.
(Hello alpha <br>
renaming?)<br>
2. Nothing prevents matchers to be used with existing
checker API for <br>
simplifying conditional checking inside callbacks. The
matchers are not <br>
planned as the replacement for the current API, but look
like a nice <br>
complementary part.<br>
3. Implicit conversion of Matcher<ProgramPoint> to
Matcher<ExplodedNode> <br>
and Matcher<SymExpr || MemRegion> to
Matcher<SVal> for writing shorter code.<br>
<br>
-------- Not implemented/not checked yet --------<br>
I tried to keep the PoC as minimal as possible. As the
result, some <br>
features are not implemented yet, but I believe they can
be added.<br>
1. Usage of matchers inside checker callbacks<br>
This is not exactly unimplemented, but is still
untested.<br>
2. Dynamic matchers (clang-query interface)<br>
In addition to work for supporting dynamic nodes (I
don't know how <br>
many efforts it can take), we need to enable matching
against AST nodes, <br>
not graph. But it doesn't look as a problem, we can just
make matchers <br>
polymorphic.<br>
3. Binding to successfully matched paths is not
implemented yet - only <br>
bindings to separate nodes. I wonder if we need path
bindings at all.<br>
4. Some corner cases are still FIXMEs: single-node
sequences, for example.<br>
5. GDM is not based on immutable data structures like it
is done in CSA <br>
- it is just an STL map. Its performance can be poor (full
copy on every <br>
new node), but I don't think that changing it to use
immutable <br>
structures is hard.<br>
6. Matching on-the-fly<br>
GraphMatchFinder should support on-the-fly matching
during <br>
ExplodedGraph building. For this support, we should just
call its <br>
advance() method each time a new node is created. However,
this <br>
possibility wasn't checked yet.<br>
7. Matching CFG and CallGraph isn't implemented because it
was <br>
considered to be far out of simple PoC.<br>
8. Only sequential matching is available now, and I didn't
try to <br>
implement other operators yet. So, implementing a checker
similar to <br>
PthreadLock can be tricky or even impossible for now.<br>
<br>
-------- Known and potential issues --------<br>
From matchers' side:<br>
1. Some tests don't pass because they rely on the checker
ability to <br>
generate sink nodes. Our matchers cannot do it by design
so tests don't <br>
pass.<br>
2. Representing checker bindings as a map can be less
effective than <br>
storing data in structures. I didn't measure the overhead,
so I cannot <br>
give any numbers.<br>
3. Matchers are called on every node independently of its
type. This is <br>
not what CheckerManager does. I wonder if this detail can
affect <br>
performance as well.<br>
4. Problems with cyclic dependencies. clangStaticAnalyzer
has a <br>
dependency on clangASTMatchers, therefore,
clangASTMatchers cannot <br>
depend on clangStaticAnalyzer. However, if we want
ASTMatchers to <br>
support static analyzer data structures, there should be a
backward <br>
dependency. Now this dependency is header-only.<br>
5. Code duplication. This is mostly fine for a sandbox but
needs to be <br>
reduced.<br>
<br>
From analyzer's side:<br>
1. Many events are not reflected in the final
ExplodedGraph. For <br>
example, checkers can receive PointerEscape event, but the
event itself <br>
will not be recorded in the ExplodedGraph - only changes
made by <br>
checkers will. That's also true for Stmt nodes: I noticed
same issues <br>
with PostCondition. This makes matching a bit harder.
Should we add some <br>
information into ExplodedGraph?<br>
2. (Minor) Some static analyzer data structures don't
support <br>
traditional LLVM casting methods (dyn_cast, isa) because
they lack <br>
classof() method. I have fixed this internally and will
put a patch on <br>
review soon.<br>
<br>
-------- Conclusion --------<br>
Finally, there is a graph matching engine supporting basic
functionality <br>
and two checkers using it. I apologize for not starting
the discussion <br>
earlier, but I just wasn't sure that the approach will
work. Is anybody <br>
interested to have this stuff in upstream (maybe, changed
or improved)? <br>
If yes, I'll be happy to contribute this work
patch-by-patch and <br>
continue its improvement. If no - well, I had enough fun
playing with it.<br>
</blockquote>
<div><br>
</div>
<div>Thanks for sharing this results. Regardless of being
upstreamed as is or</div>
<div>in a separate form or not at all, this is an
interesting experiment for the</div>
<div>community. <br>
</div>
<div> </div>
</div>
</div>
</div>
</blockquote>
<p><br>
</p>
</body>
</html>