<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">Necro-updating the status since some
demand for this or similar CSA feature was discovered on EuroLLVM
BoF. Unfortunately, I had few time to develop path-sensitive
matchers last year so the change list is not very large.</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">1. On-the-fly matching while building
ExplodedGraph. Both offline and online matching is available. The
way how a checker handles graphs depends on how it was registered
in CheckerManager: there is a separate method
registerPathMatcher() that adds matcher to the set of online
matchers.<br>
</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">2. Path-sensitive matchers gained the
ability to generate new nodes and sinks in the callbacks. There
are two limitations, however.</div>
<div class="moz-cite-prefix">a) AST Matchers don't allow calling
callbacks until matching is complete, so I haven't implemented
this ability for path-sensitive matchers too.</div>
<div class="moz-cite-prefix">b) Only on-the-fly matchers are allowed
to add nodes (because the ability to add nodes to a complete graph
seems useless).<br>
</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">3. The primary advancement for this PoC
is that path-sensitive matchers now can work as dynamic matchers
so they can be used with clang-query tool. Here is an example
clang-query script:</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">let NotChdir
callExpr(unless(callee(functionDecl(hasName("::chdir"))))).bind("bad_action")<br>
<br>
match<br>
functionDecl(<br>
hasPath(hasSequence(<br>
postStmt(hasStatement(callExpr(callee(functionDecl(hasName("::chroot")))).bind("chroot"))),<br>
unless(<br>
stmtPoint(hasStatement(callExpr(callee(functionDecl(hasName("::chdir"))),<br>
hasArgument(0,
hasValue(stringRegion(refersString("/")))))))),<br>
explodedNode(anyOf(postStmt(hasStatement(NotChdir)),<br>
callEnter(hasCallExpr(NotChdir)))).bind("bug_node"))))<br>
<br>
whose output on chroot.c is:</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">
<div class="de1">$ bin<span class="sy0">/</span>clang-query <span
class="sy0"></span>llvm<span class="sy0">/</span>tools<span
class="sy0">/</span>clang<span class="sy0">/</span>test<span
class="sy0">/</span>Analysis<span class="sy0">/</span>chroot.c
<span class="re5">-f</span>=debug-cmd.query</div>
<div class="de1"> </div>
<div class="de1">Match <span class="co0">#1:</span></div>
<div class="de1"> </div>
<div class="de2">llvm<span class="sy0">/</span>tools<span
class="sy0">/</span>clang<span class="sy0">/</span>test<span
class="sy0">/</span>Analysis<span class="sy0">/</span>chroot.c:<span
class="nu0">12</span>:<span class="nu0">3</span>: note: <span
class="st0">"bad_action"</span> binds here</div>
<div class="de1"> foo<span class="br0">(</span><span class="br0">)</span>;
<span class="sy0">//</span> expected-warning <span class="br0">{</span><span
class="br0">{</span>No call of chdir<span class="br0">(</span><span
class="st0">"/"</span><span class="br0">)</span> immediately
after <span class="kw2">chroot</span><span class="br0">}</span><span
class="br0">}</span></div>
<div class="de1"> ^~~~~</div>
<div class="de1">llvm<span class="sy0">/</span>tools<span
class="sy0">/</span>clang<span class="sy0">/</span>test<span
class="sy0">/</span>Analysis<span class="sy0">/</span>chroot.c:<span
class="nu0">11</span>:<span class="nu0">3</span>: note: <span
class="st0">"chroot"</span> binds here</div>
<div class="de1"> <span class="kw2">chroot</span><span
class="br0">(</span><span class="st0">"/usr/local"</span><span
class="br0">)</span>; <span class="sy0">//</span> root
changed.</div>
<div class="de2"> ^~~~~~~~~~~~~~~~~~~~</div>
<div class="de1">llvm<span class="sy0">/</span>tools<span
class="sy0">/</span>clang<span class="sy0">/</span>test<span
class="sy0">/</span>Analysis<span class="sy0">/</span>chroot.c:<span
class="nu0">10</span>:<span class="nu0">1</span>: note: <span
class="st0">"root"</span> binds here</div>
<div class="de1">void f1<span class="br0">(</span>void<span
class="br0">)</span> <span class="br0">{</span></div>
<div class="de1">^~~~~~~~~~~~~~~</div>
<div class="de1"> </div>
<div class="de2">Match <span class="co0">#2:</span></div>
<div class="de1"> </div>
<div class="de1"><span class="sy0"></span>llvm<span class="sy0">/</span>tools<span
class="sy0">/</span>clang<span class="sy0">/</span>test<span
class="sy0">/</span>Analysis<span class="sy0">/</span>chroot.c:<span
class="nu0">24</span>:<span class="nu0">3</span>: note: <span
class="st0">"bad_action"</span> binds here</div>
<div class="de1"> foo<span class="br0">(</span><span class="br0">)</span>;
<span class="sy0">//</span> expected-warning <span class="br0">{</span><span
class="br0">{</span>No call of chdir<span class="br0">(</span><span
class="st0">"/"</span><span class="br0">)</span> immediately
after <span class="kw2">chroot</span><span class="br0">}</span><span
class="br0">}</span></div>
<div class="de1"> ^~~~~</div>
<div class="de2">llvm<span class="sy0">/</span>tools<span
class="sy0">/</span>clang<span class="sy0">/</span>test<span
class="sy0">/</span>Analysis<span class="sy0">/</span>chroot.c:<span
class="nu0">22</span>:<span class="nu0">3</span>: note: <span
class="st0">"chroot"</span> binds here</div>
<div class="de1"> <span class="kw2">chroot</span><span
class="br0">(</span><span class="st0">"/usr/local"</span><span
class="br0">)</span>; <span class="sy0">//</span> root
changed.</div>
<div class="de1"> ^~~~~~~~~~~~~~~~~~~~</div>
<div class="de1">llvm<span class="sy0">/</span>tools<span
class="sy0">/</span>clang<span class="sy0">/</span>test<span
class="sy0">/</span>Analysis<span class="sy0">/</span>chroot.c:<span
class="nu0">21</span>:<span class="nu0">1</span>: note: <span
class="st0">"root"</span> binds here</div>
<div class="de1">void f3<span class="br0">(</span>void<span
class="br0">)</span> <span class="br0">{</span></div>
<div class="de2">^~~~~~~~~~~~~~~</div>
<div class="de1"><span class="nu0">2</span> matches.</div>
<div class="de1"><br>
</div>
This can allow us to prototype checkers without re-compiling them
after every change. <br>
</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">The code can be found on my Github:</div>
<div class="moz-cite-prefix">clang: <a
class="moz-txt-link-freetext"
href="https://github.com/a-sid/clang/tree/a4-dynamic-path-matchers">https://github.com/a-sid/clang/tree/a4-dynamic-path-matchers</a><br>
</div>
<div class="moz-cite-prefix">clang-tools-extra: <a
class="moz-txt-link-freetext"
href="https://github.com/a-sid/clang-tools-extra/tree/a4-support">https://github.com/a-sid/clang-tools-extra/tree/a4-support</a></div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">If there is still a need this feature,
I'll start the discussion on its design because it requires some
intrusive changes into AST Matchers code.</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">17.05.2018 2:37, Alexey Sidorin via
cfe-dev пишет:<br>
</div>
<blockquote type="cite"
cite="mid:64673e0b-3e74-15fd-ae2f-0fb176476eaf@ya.ru">Hello
everyone, <br>
<br>
I'd like to share some results of my investigation targeted on
improvement of Clang Static Analyzer checker API. You can find
some previous conversation on this topic here:
<a class="moz-txt-link-freetext"
href="http://clang-developers.42468.n3.nabble.com/analyzer-RFC-Design-idea-separate-modelling-from-checking-td4059122.html">http://clang-developers.42468.n3.nabble.com/analyzer-RFC-Design-idea-separate-modelling-from-checking-td4059122.html</a>.
In my investigation, I tried to solve a particular problem of
building a checker without generating new nodes. <br>
<br>
--------- Introduction and design goals --------- <br>
<br>
In brief, I tried to use matchers-like API to make CSA checkers
look like this: <br>
<br>
StatementMatcher NotChdir = <br>
callExpr(unless(callee(functionDecl(hasName("::chdir"))))); <br>
Finder.addMatcher( <br>
hasSequence( <br>
postStmt(hasStatement( <br>
callExpr(callee(functionDecl(hasName("::chroot")))))),
<br>
unless(stmtPoint(hasStatement(callExpr( <br>
callee(functionDecl(hasName("::chdir"))), <br>
hasArgument(0,
hasValue(stringRegion(refersString("/")))))))), <br>
explodedNode(anyOf(postStmt(hasStatement(NotChdir)), <br>
callEnter(hasCallExpr(NotChdir)))) <br>
.bind("bug_node")), <br>
&Callback); <br>
Finder.match(G); <br>
<br>
and I have managed to make some simple working examples. <br>
<br>
The entire diff can be found here:
<a class="moz-txt-link-freetext"
href="https://github.com/a-sid/clang/commit/9a0b1d1d9b3cf41b551a663f041f54d5427aa72f">https://github.com/a-sid/clang/commit/9a0b1d1d9b3cf41b551a663f041f54d5427aa72f</a><br>
The code itself: <a class="moz-txt-link-freetext"
href="https://github.com/a-sid/clang/tree/a4">https://github.com/a-sid/clang/tree/a4</a>
<br>
<br>
There are several reasons why I have tried this approach. <br>
<br>
1. AST Matchers are already extensively used API for AST checking.
They are available both in Clang-Tidy and CSA. And I wanted to use
existing functionality as much as possible. So, I decided to
extend an existing API to make its usage seamless across different
kinds of checks: path-sensitive, AST-based and CFG-based. <br>
<br>
2. AST matchers effectively help clients to avoid a lot of
checking of dyn_cast results. This feature not only makes them
more convenient but also more safe because, in my experience,
forgetting a nullptr/None check is a pretty common mistake for
checker writers. <br>
<br>
3. Testing of AST matchers don't require writing C++ code - it can
be done interactively with clang-query tool. And I believe that we
need similar functionality for CSA as well. <br>
<br>
I didn't want to replace existing checker API. Instead, I tried to
make new possibilities usable independently or together with
existing. <br>
<br>
--------- Design and implementation --------- <br>
<br>
The implementation consisted of a number of steps. <br>
<br>
1. Allow matching nodes of path-sensitive graph like usual AST
nodes. To make this possible, three actions were needed: <br>
1.1. ASTTypeTraits and DynTypedNode were extended to support
path-sensitive nodes: ExplodedNode, ProgramState, SVal, SymExpr,
etc. The implementation with graph node support is moved into a
separate class (ASTGraphTypeTraits) in ento namespace to resolve
cyclic dependencies (they are still exist, unfortunately, but are
header-only, so we can build the PoC). <br>
1.2. Some additions to AST matchers were made to add support for
new kinds of nodes. <br>
1.3. To make MatchFinder able to query specific options not
supported by pure AST, it was augmented with "Contexts". A matcher
that needs to query the path-sensitive engine asks the Finder for
the required Context which provides specific helper functions. <br>
<br>
As the result of this step, we are able to write matchers like
expr(hasArgument(0, hasValue(stringRegion(refersString("/"))))). <br>
<br>
2. Create an engine for graph exploration and matching. <br>
Unlike normal AST matchers, hasSequence is a path-sensitive
matcher. It is launched against ExplodedGraph. These matchers are
handled by GraphMatchFinder object. While searching a graph, it
collects matches. Each match contains a pointer to the
corresponding matcher and State ID of this match. The way how
matches are translated from one state ID to another is determined
by matcher operators. <br>
<br>
For example, SequenceVariadicOperator, which is the base of
hasSequence() matcher, has "positive" and "negative" sub-matches.
Each positive matcher has its corresponding State ID. In order to
advance to the next State ID, a node being matched should match
all "negative" matchers before the next "positive" matchers and
the next "positive" matcher itself. Failure in matching "negative"
matcher discards the match. <br>
<br>
The role of GraphMatchFinder is similar to MatchFinder: it is
only responsible for graph exploration and keeping bound nodes and
matchers. <br>
<br>
3. Manage bound nodes. <br>
While matching a single graph node, BoundNodes from AST
MatchFinder are used. AST matchers for path-sensitive nodes
support bindings out-of-the-box. However, the situation with graph
matching is a bit different. For graph matching, we have another
system of bound nodes. Each graph node has a related map of bounds
aka GDMTy (yes, the name is not a coincidence). GDMTy is a mapping
from match ID to BoundNodesMap which, in part, is effectively a
map from std::string to DynTypedNodes. This look pretty much like
how GDM is organized in ExplodedGraph, just without one level of
indirection (it can be added, though). <br>
<br>
MatchFinder contexts should allow support of their own bindings.
This is how equalsBoundNode() matcher works for path-sensitive
nodes: it just queries all available contexts for the binding. <br>
<br>
Finally, I have managed to make two checkers work in this way:
ChrootChecker (which is present in the introduction) and
TestAfterDivZeroChecker. Both them can be found in
ChrootCheckerV2.cpp and TestAfterDivZeroCheckerV2.cpp
correspondingly. They act like normal checkers: produce warnings
and use visitors. The main difference is that they cannot generate
sinks and use checkEndAnalysis callback. The code of these
checkers can be found here: <br>
<br>
<a class="moz-txt-link-freetext"
href="https://github.com/a-sid/clang/blob/a4/lib/StaticAnalyzer/Checkers/ChrootCheckerV2.cpp">https://github.com/a-sid/clang/blob/a4/lib/StaticAnalyzer/Checkers/ChrootCheckerV2.cpp</a>
<br>
<a class="moz-txt-link-freetext"
href="https://github.com/a-sid/clang/blob/a4/lib/StaticAnalyzer/Checkers/TestAfterDivZeroCheckerV2.cpp">https://github.com/a-sid/clang/blob/a4/lib/StaticAnalyzer/Checkers/TestAfterDivZeroCheckerV2.cpp</a>
<br>
<br>
<br>
-------- Some features -------- <br>
<br>
1.The design of bindings has an interesting consequence: it
enables the dynamic introspection of GDM which was pretty hard
before. (Hello alpha renaming?) <br>
2. Nothing prevents matchers to be used with existing checker API
for simplifying conditional checking inside callbacks. The
matchers are not planned as the replacement for the current API,
but look like a nice complementary part. <br>
3. Implicit conversion of Matcher<ProgramPoint> to
Matcher<ExplodedNode> and Matcher<SymExpr ||
MemRegion> to Matcher<SVal> for writing shorter code. <br>
<br>
-------- Not implemented/not checked yet -------- <br>
I tried to keep the PoC as minimal as possible. As the result,
some features are not implemented yet, but I believe they can be
added. <br>
1. Usage of matchers inside checker callbacks <br>
This is not exactly unimplemented, but is still untested. <br>
2. Dynamic matchers (clang-query interface) <br>
In addition to work for supporting dynamic nodes (I don't know
how many efforts it can take), we need to enable matching against
AST nodes, not graph. But it doesn't look as a problem, we can
just make matchers polymorphic. <br>
3. Binding to successfully matched paths is not implemented yet -
only bindings to separate nodes. I wonder if we need path bindings
at all. <br>
4. Some corner cases are still FIXMEs: single-node sequences, for
example. <br>
5. GDM is not based on immutable data structures like it is done
in CSA - it is just an STL map. Its performance can be poor (full
copy on every new node), but I don't think that changing it to use
immutable structures is hard. <br>
6. Matching on-the-fly <br>
GraphMatchFinder should support on-the-fly matching during
ExplodedGraph building. For this support, we should just call its
advance() method each time a new node is created. However, this
possibility wasn't checked yet. <br>
7. Matching CFG and CallGraph isn't implemented because it was
considered to be far out of simple PoC. <br>
8. Only sequential matching is available now, and I didn't try to
implement other operators yet. So, implementing a checker similar
to PthreadLock can be tricky or even impossible for now. <br>
<br>
-------- Known and potential issues -------- <br>
From matchers' side: <br>
1. Some tests don't pass because they rely on the checker ability
to generate sink nodes. Our matchers cannot do it by design so
tests don't pass. <br>
2. Representing checker bindings as a map can be less effective
than storing data in structures. I didn't measure the overhead, so
I cannot give any numbers. <br>
3. Matchers are called on every node independently of its type.
This is not what CheckerManager does. I wonder if this detail can
affect performance as well. <br>
4. Problems with cyclic dependencies. clangStaticAnalyzer has a
dependency on clangASTMatchers, therefore, clangASTMatchers cannot
depend on clangStaticAnalyzer. However, if we want ASTMatchers to
support static analyzer data structures, there should be a
backward dependency. Now this dependency is header-only. <br>
5. Code duplication. This is mostly fine for a sandbox but needs
to be reduced. <br>
<br>
From analyzer's side: <br>
1. Many events are not reflected in the final ExplodedGraph. For
example, checkers can receive PointerEscape event, but the event
itself will not be recorded in the ExplodedGraph - only changes
made by checkers will. That's also true for Stmt nodes: I noticed
same issues with PostCondition. This makes matching a bit harder.
Should we add some information into ExplodedGraph? <br>
2. (Minor) Some static analyzer data structures don't support
traditional LLVM casting methods (dyn_cast, isa) because they lack
classof() method. I have fixed this internally and will put a
patch on review soon. <br>
<br>
-------- Conclusion -------- <br>
Finally, there is a graph matching engine supporting basic
functionality and two checkers using it. I apologize for not
starting the discussion earlier, but I just wasn't sure that the
approach will work. Is anybody interested to have this stuff in
upstream (maybe, changed or improved)? If yes, I'll be happy to
contribute this work patch-by-patch and continue its improvement.
If no - well, I had enough fun playing with it. <br>
<br>
_______________________________________________ <br>
cfe-dev mailing list <br>
<a class="moz-txt-link-abbreviated"
href="mailto:cfe-dev@lists.llvm.org">cfe-dev@lists.llvm.org</a>
<br>
<a class="moz-txt-link-freetext"
href="http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev">http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a>
<br>
</blockquote>
<p><br>
</p>
</body>
</html>