<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">18.05.2018 04:47, Artem Dergachev
пишет:<br>
</div>
<blockquote type="cite"
cite="mid:1f0beb2b-0e33-bd7d-f80d-3a23f8c4ed88@gmail.com">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
On 5/17/18 5:28 AM, Gábor Horváth wrote:<br>
<blockquote type="cite"
cite="mid:CAPRL4a0WncdqW394JFDZ7SdA0+bF0jgFrH7-wnfsN03mFADH9g@mail.gmail.com">
<div dir="ltr">
<div>Hi Alexey,</div>
<div><br>
</div>
<div>In general, I like the idea of having a more declarative
way to define new</div>
<div>checks.<br>
</div>
<div><br>
</div>
<div>
<div class="gmail_quote">
<div dir="ltr">On Thu, 17 May 2018 at 01:37, Alexey
Sidorin <<a href="mailto:alexey.v.sidorin@ya.ru"
moz-do-not-send="true">alexey.v.sidorin@ya.ru</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">Hello
everyone,<br>
<br>
I'd like to share some results of my investigation
targeted on <br>
improvement of Clang Static Analyzer checker API. You
can find some <br>
previous conversation on this topic here: <br>
<a
href="http://clang-developers.42468.n3.nabble.com/analyzer-RFC-Design-idea-separate-modelling-from-checking-td4059122.html"
rel="noreferrer" target="_blank"
moz-do-not-send="true">http://clang-developers.42468.n3.nabble.com/analyzer-RFC-Design-idea-separate-modelling-from-checking-td4059122.html</a>.
<br>
In my investigation, I tried to solve a particular
problem of building a <br>
checker without generating new nodes.<br>
</blockquote>
<div><br>
</div>
<div>Why do you focus on such checks that does not have
traits, does not generate new nodes. <br>
</div>
<div>Do you find this to be the majority of the checks you
need to implement? <br>
</div>
</div>
</div>
</div>
</blockquote>
<br>
Ok, so as far as i understand, these new checkers don't support
traits because they don't *need* them. Instead, they plan to use
"backreferences" in matchers, i.e. double file descriptor close
can be described as:<br>
<br>
hasSequence(<br>
call(hasName("fclose"), hasArg(0, sym().bind("fd"))),<br>
call(hasName("fclose"), hasArg(0, sym(equalsBoundNode("fd"))))<br>
);<br>
<br>
Right?<br>
</blockquote>
Yes, exactly.<br>
<br>
<blockquote type="cite"
cite="mid:1f0beb2b-0e33-bd7d-f80d-3a23f8c4ed88@gmail.com"> <br>
So we can't eliminate the path on which the pointer is null after
we dereference it, like we normally do, we can't introduce state
splits, we can't produce fatal error nodes, but we can still use
finite state machines.<br>
</blockquote>
As I told before, I don't like state splitting and generating sinks
in checker. However, it is not _exactly_ true that we cannot do it
with matchers - because there are not only matchers, but are also
_callbacks_. If we perform a match on-the-fly (I didn't implement
this, unfortunately, but it doesn't look hard), we can do almost
anything in the callback after a match was found.<br>
<br>
<blockquote type="cite"
cite="mid:1f0beb2b-0e33-bd7d-f80d-3a23f8c4ed88@gmail.com"> <br>
The real question is, how do we implement RetainCountChecker that
needs to count the potentially infinite number of retains and
releases before emitting the warning, and is therefore not a
finite state machine?^^<br>
</blockquote>
I'm not sure that I understood you correctly, but if we want to use
something different from sequential model, we can implement our
custom path matchers - like it is done for AST matchers. It took
some time to implement a working (but still poorly designed)
example:<br>
<a class="moz-txt-link-freetext" href="https://github.com/a-sid/clang/commit/7b94a5e648eed32b9fdc7cb7797b9450b0f3d1ed#diff-0aec754c14839c59eb296a889a7940dbR180">https://github.com/a-sid/clang/commit/7b94a5e648eed32b9fdc7cb7797b9450b0f3d1ed#diff-0aec754c14839c59eb296a889a7940dbR180</a>.
<br>
<br>
The idea was to generalize some most common search patterns into
matchers. Counting can be implemented pretty easily because counter
states are still integers. However, if our state cannot be
represented as an integer, there can be some troubles.<br>
What I have also realized is that some way(s) of using different
matchers together is needed. Reference counting is pretty useless on
its own, it needs some actions to match if we are searching for a
bug.<br>
<br>
<br>
<blockquote type="cite"
cite="mid:1f0beb2b-0e33-bd7d-f80d-3a23f8c4ed88@gmail.com">
<blockquote type="cite"
cite="mid:CAPRL4a0WncdqW394JFDZ7SdA0+bF0jgFrH7-wnfsN03mFADH9g@mail.gmail.com">
<div dir="ltr">
<div>
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex"> <br>
--------- Introduction and design goals ---------<br>
<br>
In brief, I tried to use matchers-like API to make CSA
checkers look <br>
like this:<br>
<br>
StatementMatcher NotChdir =<br>
callExpr(unless(callee(functionDecl(hasName("::chdir")))));<br>
Finder.addMatcher(<br>
hasSequence(<br>
postStmt(hasStatement(<br>
callExpr(callee(functionDecl(hasName("::chroot")))))),<br>
unless(stmtPoint(hasStatement(callExpr(<br>
callee(functionDecl(hasName("::chdir"))),<br>
hasArgument(0,
hasValue(stringRegion(refersString("/")))))))),<br>
explodedNode(anyOf(postStmt(hasStatement(NotChdir)),<br>
callEnter(hasCallExpr(NotChdir))))<br>
.bind("bug_node")),<br>
&Callback);<br>
Finder.match(G);<br>
</blockquote>
<div><br>
</div>
</div>
</div>
</div>
</blockquote>
<br>
Aha, so am i understanding correctly that unless() here is not
your normal unless()? The normal unless() clause would have meant<br>
<br>
"somewhere between chroot() and the offending call there's a
node that's not chdir()"<br>
<br>
but what we're trying to say is<br>
<br>
"nowhere between chroot() and the offending call there's a
node that's chdir()"<br>
<br>
?<br>
</blockquote>
Yes. I should probably create a normal matcher and don't reuse
unless() for this to make it less confusive.<br>
<br>
<blockquote type="cite"
cite="mid:1f0beb2b-0e33-bd7d-f80d-3a23f8c4ed88@gmail.com">
<blockquote type="cite"
cite="mid:CAPRL4a0WncdqW394JFDZ7SdA0+bF0jgFrH7-wnfsN03mFADH9g@mail.gmail.com">
<div dir="ltr">
<div>
<div class="gmail_quote">
<div>I think, a common (design) pitfall of writing checks
is to try to match against</div>
<div>the AST when a symbolic execution callback should be
used instead.</div>
<div>I am a bit afraid having an API like this would make
this pitfall more common.</div>
<div>Maybe a separation between the path sensitive, flow
sensitive and</div>
<div>AST based checks is good for the checker writers and
new users and</div>
<div>I am not sure that the same abstraction fits all of
these cases.</div>
<div><br>
</div>
<div>In case of path sensitive checks I wonder if a state
machine like abstraction would</div>
<div>fit the use cases better (and also cover checks that
are using traits)</div>
<div>where the guarding conditions of the state
transitions can include AST matcher like checks. <br>
</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex"> <br>
and I have managed to make some simple working examples.<br>
<br>
The entire diff can be found here: <br>
<a
href="https://github.com/a-sid/clang/commit/9a0b1d1d9b3cf41b551a663f041f54d5427aa72f"
rel="noreferrer" target="_blank"
moz-do-not-send="true">https://github.com/a-sid/clang/commit/9a0b1d1d9b3cf41b551a663f041f54d5427aa72f</a><br>
The code itself: <a
href="https://github.com/a-sid/clang/tree/a4"
rel="noreferrer" target="_blank"
moz-do-not-send="true">https://github.com/a-sid/clang/tree/a4</a><br>
<br>
There are several reasons why I have tried this
approach.<br>
<br>
1. AST Matchers are already extensively used API for AST
checking. They <br>
are available both in Clang-Tidy and CSA. And I wanted
to use existing <br>
functionality as much as possible. So, I decided to
extend an existing <br>
API to make its usage seamless across different kinds of
checks: <br>
path-sensitive, AST-based and CFG-based.<br>
</blockquote>
</div>
</div>
</div>
</blockquote>
<br>
When we say CFG-based checks, we usually mean "all paths" checks
that find bugs indicated by a certain invariant holding on all
paths through a certain program point(s). Such as
TestAfterDivideZero or DeadStore.<br>
<br>
The problem with CFG-based "all paths" checks that i don't really
know how to solve and that's probably not going to be solved by
giving them a good API is error reporting. Imagine a conversation
with a user:<br>
<br>
A n a l y z e r : Hey this assignment is not used.<br>
<br>
U s e r : But hey, it's used here *points to the code*.<br>
<br>
A n a l y z e r : But this code is dead.<br>
<br>
U s e r : Why so? Suppose 'x' is equal to true, we jump
straight into this code.<br>
<br>
A n a l y z e r : That's because the initializer for 'x' is in
fact always evaluated to false.<br>
<br>
U s e r : WHY t___t<br>
<br>
Etc. The first piece of information is produced by a DeadStore
checker. The second piece of information is produced by some sort
of DeadCode checker (currently alpha). The third piece of
information is produced by some sort of SameResLogicalExpr checker
(currently in the list of potential checkers). All three checkers
are all-paths checkers. And unless all three checker's diagnostics
are presented to the user at the same time, the user will be
unable to understand the bug report.<br>
<br>
I believe that we should really solve this problem somehow (or
explain to me why it isn't a problem) before we try to introduce a
massive increase in the number of CFG-based checkers we have
(currently 1).<br>
</blockquote>
Looks like I have used wrong words to describe what I wanted to do
with CFG. Initially, I wanted to support CFG search as well but
quicky realized nobody just needs it; however, some rudiments are
still remaining in the code. Instead, I meant exposing CFG analyses
in a user-friendly way, like it is done in TestAfterDivZeroV2. I
think it makes much more sense because CFG analyses like
reachability or postdomination can be used for FP suppression in the
checkers and in the analyzer engine.<br>
<br>
<blockquote type="cite"
cite="mid:1f0beb2b-0e33-bd7d-f80d-3a23f8c4ed88@gmail.com"> <br>
<blockquote type="cite"
cite="mid:CAPRL4a0WncdqW394JFDZ7SdA0+bF0jgFrH7-wnfsN03mFADH9g@mail.gmail.com">
<div dir="ltr">
<div>
<div class="gmail_quote">
<div>I am not sure if this is a good idea. I think the
checker author supposed to</div>
<div>be aware if the check is AST based, flow sensitive,
or path sensitive. <br>
</div>
<div>And this should also be easy to tell by reading the
code. I am also not</div>
<div>sure whether there is an abstraction that fits all.</div>
<div><br>
</div>
<div>I think the reason why this idea works well for
checks that only inspect the</div>
<div>exploded graph, because in this case we are still
doing pattern matching on</div>
<div>graphs in a similar manner as doing pattern matching
on the AST.</div>
<div><br>
</div>
<div>But does this generalize later on to stateful checks?
<br>
</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex"> <br>
2. AST matchers effectively help clients to avoid a lot
of checking of <br>
dyn_cast results. This feature not only makes them more
convenient but <br>
also more safe because, in my experience, forgetting a
nullptr/None <br>
check is a pretty common mistake for checker writers.<br>
</blockquote>
<div><br>
</div>
<div>
<div>I think if something cannot be null we should
return references, otherwise</div>
<div>the client need to check for the pointer beeing
null (unless there are some</div>
<div>dynamic invariant that would make the check
redundant). If an API does</div>
<div>not match this philosophy, maybe we should fix the
API first. </div>
<div><br>
</div>
<div>Regardless of fixing the API, I agree that it would
be great to have higher</div>
<div>level APIs for checker authors.</div>
</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex"> <br>
3. Testing of AST matchers don't require writing C++
code - it can be <br>
done interactively with clang-query tool. And I believe
that we need <br>
similar functionality for CSA as well.<br>
</blockquote>
<div><br>
</div>
Having a query tool for exploded graphs could be very
helpful, I agree.</div>
<div class="gmail_quote">These graphs tend to be very large
and sometimes it is not trivial to</div>
<div class="gmail_quote">find the relevant parts during
debugging.<br>
</div>
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex"> <br>
I didn't want to replace existing checker API. Instead,
I tried to make <br>
new possibilities usable independently or together with
existing.<br>
<br>
--------- Design and implementation ---------<br>
<br>
The implementation consisted of a number of steps.<br>
<br>
1. Allow matching nodes of path-sensitive graph like
usual AST nodes. To <br>
make this possible, three actions were needed:<br>
1.1. ASTTypeTraits and DynTypedNode were extended to
support <br>
path-sensitive nodes: ExplodedNode, ProgramState, SVal,
SymExpr, etc. <br>
</blockquote>
<div><br>
</div>
<div>I wonder, if in general it is a good idea to pattern
match in checks on SymExpr.</div>
<div>A more general approach here would be to use the
constraint solver for queries</div>
<div>in a similar manner how ProgramState::assume works.<br>
</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex"> The
implementation with graph node support is moved into a
separate <br>
class (ASTGraphTypeTraits) in ento namespace to resolve
cyclic <br>
dependencies (they are still exist, unfortunately, but
are header-only, <br>
so we can build the PoC).<br>
1.2. Some additions to AST matchers were made to add
support for new <br>
kinds of nodes.<br>
1.3. To make MatchFinder able to query specific
options not supported <br>
by pure AST, it was augmented with "Contexts". A matcher
that needs to <br>
query the path-sensitive engine asks the Finder for the
required Context <br>
which provides specific helper functions.<br>
<br>
As the result of this step, we are able to write
matchers like <br>
expr(hasArgument(0,
hasValue(stringRegion(refersString("/"))))).<br>
</blockquote>
</div>
</div>
</div>
</blockquote>
<br>
Yeah, we'll also eventually need operators over symbols, eg.:<br>
<br>
arraySubscriptExpr(hasSubscript(expr(hasSVal(sym(isLessThan(<br>
sym(equalsBoundNode("array_size")))).bind("array_subscript")))))<br>
</blockquote>
There is nothing impossible here - such expressions should be easy
to write with augmented AST matchers used in this PoC.<br>
<blockquote type="cite"
cite="mid:1f0beb2b-0e33-bd7d-f80d-3a23f8c4ed88@gmail.com"> <br>
Still, SVal matchers are something i always wanted. I even made
SVal visitors. The latter turned out to be pretty useless though
:/<br>
<br>
<blockquote type="cite"
cite="mid:CAPRL4a0WncdqW394JFDZ7SdA0+bF0jgFrH7-wnfsN03mFADH9g@mail.gmail.com">
<div dir="ltr">
<div>
<div class="gmail_quote">
<div>I think this is the part of the proposal that I like
the most, it would be</div>
<div>a very concise way to write down guarding conditions.<br>
</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex"> <br>
2. Create an engine for graph exploration and matching.<br>
Unlike normal AST matchers, hasSequence is a
path-sensitive matcher. <br>
It is launched against ExplodedGraph. These matchers are
handled by <br>
GraphMatchFinder object. While searching a graph, it
collects matches. <br>
Each match contains a pointer to the corresponding
matcher and State ID <br>
of this match. The way how matches are translated from
one state ID to <br>
another is determined by matcher operators.<br>
</blockquote>
<div><br>
</div>
<div>Is this matching done after the exploded graph
already built? If so,</div>
<div>these checks will be unable to generate sink nodes. I
think having sink</div>
<div>nodes sometimes desirable even if the check itself
does not have a trait.</div>
</div>
</div>
</div>
</blockquote>
<br>
I'm really curious about how the performance scales with the
number of checkers. Is it going to be significantly more expensive
to run 100 DSL-based checkers on a fully built graph than to run
100 normal checkers as the graph is being built?<br>
</blockquote>
No benchmarks => no answer, sorry. My wild guess won't work here
:(<br>
<blockquote type="cite"
cite="mid:1f0beb2b-0e33-bd7d-f80d-3a23f8c4ed88@gmail.com"> <br>
This boils down to: by looking at the sub-matcher, will you be
able to filter out checkers that don't need to act upon the node
as quickly as you do that when calling checker callbacks?<br>
</blockquote>
<br>
I think it's a hard goal to achieve.<br>
1. To separate matcher callbacks, we need for matchers to expose
meta-information. Some single matchers also can participate in
multiple callbacks (anyOf()).<br>
2. To make matchers callable with a matcher which is a part of it,
we need to re-design the path matchers. They should have a mapping
between sub-matchers and actions required if a sub-matcher matches.<br>
<br>
<blockquote type="cite"
cite="mid:1f0beb2b-0e33-bd7d-f80d-3a23f8c4ed88@gmail.com"> <br>
And this immediately leads to: how difficult would it be to simply
auto-generate an old-style checker (with state traits and
callbacks) from the new-style matcher? Maybe that's something we
should focus on instead of trying to match a fully constructed
graph? Maybe it'd also be more powerful, i.e. allow certain
imperative side effects whenever partial match is encountered?<br>
</blockquote>
We can use matchers inside checker callbacks if we need extended
functionality. The auto-generation for matchers with side effects is
non-trivial and needs supporting partial match actions. However,
since we have BindableMatchers, we can also have ActingMatchers
(unsupported dynamically) that have `.call()` method.<br>
<br>
<blockquote type="cite"
cite="mid:1f0beb2b-0e33-bd7d-f80d-3a23f8c4ed88@gmail.com">
<blockquote type="cite"
cite="mid:CAPRL4a0WncdqW394JFDZ7SdA0+bF0jgFrH7-wnfsN03mFADH9g@mail.gmail.com">
<div dir="ltr">
<div>
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex"> <br>
For example, SequenceVariadicOperator, which is the
base of <br>
hasSequence() matcher, has "positive" and "negative"
sub-matches. Each <br>
positive matcher has its corresponding State ID. In
order to advance to <br>
the next State ID, a node being matched should match all
"negative" <br>
matchers before the next "positive" matchers and the
next "positive" <br>
matcher itself. Failure in matching "negative" matcher
discards the match.<br>
<br>
The role of GraphMatchFinder is similar to
MatchFinder: it is only <br>
responsible for graph exploration and keeping bound
nodes and matchers.<br>
<br>
3. Manage bound nodes.<br>
While matching a single graph node, BoundNodes from
AST MatchFinder <br>
are used. AST matchers for path-sensitive nodes support
bindings <br>
out-of-the-box. However, the situation with graph
matching is a bit <br>
different. For graph matching, we have another system of
bound nodes. <br>
Each graph node has a related map of bounds aka GDMTy
(yes, the name is <br>
not a coincidence). GDMTy is a mapping from match ID to
BoundNodesMap <br>
which, in part, is effectively a map from std::string to
DynTypedNodes. <br>
This look pretty much like how GDM is organized in
ExplodedGraph, just <br>
without one level of indirection (it can be added,
though).<br>
<br>
MatchFinder contexts should allow support of their
own bindings. This <br>
is how equalsBoundNode() matcher works for
path-sensitive nodes: it just <br>
queries all available contexts for the binding.<br>
<br>
Finally, I have managed to make two checkers work in
this way: <br>
ChrootChecker (which is present in the introduction) and
<br>
TestAfterDivZeroChecker. Both them can be found in
ChrootCheckerV2.cpp <br>
and TestAfterDivZeroCheckerV2.cpp correspondingly. They
act like normal <br>
checkers: produce warnings and use visitors.</blockquote>
</div>
</div>
</div>
</blockquote>
<br>
I wonder if our false positive suppression visitors (or maybe even
regular checker bugvisitors) can be rewritten with these matchers.
trackNullOrUndefValue() and its system of visitors doesn't look
like a matcher at all to me, unfortunately - it can potentially
track an infinite amount of events. But every individual visitor
within that system probably is.<br>
</blockquote>
Hm, I didn't even think about it. Do you think it can give us some
advantages for BugVisitors?<br>
<br>
<blockquote type="cite"
cite="mid:1f0beb2b-0e33-bd7d-f80d-3a23f8c4ed88@gmail.com"> <br>
<blockquote type="cite"
cite="mid:CAPRL4a0WncdqW394JFDZ7SdA0+bF0jgFrH7-wnfsN03mFADH9g@mail.gmail.com">
<div dir="ltr">
<div>
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">The
main difference is that <br>
they cannot generate sinks and use checkEndAnalysis
callback. The code <br>
of these checkers can be found here:<br>
<br>
<a
href="https://github.com/a-sid/clang/blob/a4/lib/StaticAnalyzer/Checkers/ChrootCheckerV2.cpp"
rel="noreferrer" target="_blank"
moz-do-not-send="true">https://github.com/a-sid/clang/blob/a4/lib/StaticAnalyzer/Checkers/ChrootCheckerV2.cpp</a><br>
<a
href="https://github.com/a-sid/clang/blob/a4/lib/StaticAnalyzer/Checkers/TestAfterDivZeroCheckerV2.cpp"
rel="noreferrer" target="_blank"
moz-do-not-send="true">https://github.com/a-sid/clang/blob/a4/lib/StaticAnalyzer/Checkers/TestAfterDivZeroCheckerV2.cpp</a><br>
<br>
<br>
-------- Some features --------<br>
<br>
1.The design of bindings has an interesting consequence:
it enables the <br>
dynamic introspection of GDM which was pretty hard
before. (Hello alpha <br>
renaming?)<br>
2. Nothing prevents matchers to be used with existing
checker API for <br>
simplifying conditional checking inside callbacks. The
matchers are not <br>
planned as the replacement for the current API, but look
like a nice <br>
complementary part.<br>
3. Implicit conversion of Matcher<ProgramPoint> to
Matcher<ExplodedNode> <br>
and Matcher<SymExpr || MemRegion> to
Matcher<SVal> for writing shorter code.<br>
<br>
-------- Not implemented/not checked yet --------<br>
I tried to keep the PoC as minimal as possible. As the
result, some <br>
features are not implemented yet, but I believe they can
be added.<br>
1. Usage of matchers inside checker callbacks<br>
This is not exactly unimplemented, but is still
untested.<br>
2. Dynamic matchers (clang-query interface)<br>
In addition to work for supporting dynamic nodes (I
don't know how <br>
many efforts it can take), we need to enable matching
against AST nodes, <br>
not graph. But it doesn't look as a problem, we can just
make matchers <br>
polymorphic.<br>
3. Binding to successfully matched paths is not
implemented yet - only <br>
bindings to separate nodes. I wonder if we need path
bindings at all.<br>
4. Some corner cases are still FIXMEs: single-node
sequences, for example.<br>
5. GDM is not based on immutable data structures like it
is done in CSA <br>
- it is just an STL map. Its performance can be poor
(full copy on every <br>
new node), but I don't think that changing it to use
immutable <br>
structures is hard.<br>
6. Matching on-the-fly<br>
GraphMatchFinder should support on-the-fly matching
during <br>
ExplodedGraph building. For this support, we should just
call its <br>
advance() method each time a new node is created.
However, this <br>
possibility wasn't checked yet.<br>
7. Matching CFG and CallGraph isn't implemented because
it was <br>
considered to be far out of simple PoC.<br>
8. Only sequential matching is available now, and I
didn't try to <br>
implement other operators yet. So, implementing a
checker similar to <br>
PthreadLock can be tricky or even impossible for now.<br>
<br>
-------- Known and potential issues --------<br>
From matchers' side:<br>
1. Some tests don't pass because they rely on the
checker ability to <br>
generate sink nodes. Our matchers cannot do it by design
so tests don't <br>
pass.<br>
2. Representing checker bindings as a map can be less
effective than <br>
storing data in structures. I didn't measure the
overhead, so I cannot <br>
give any numbers.<br>
3. Matchers are called on every node independently of
its type. This is <br>
not what CheckerManager does. I wonder if this detail
can affect <br>
performance as well.<br>
4. Problems with cyclic dependencies.
clangStaticAnalyzer has a <br>
dependency on clangASTMatchers, therefore,
clangASTMatchers cannot <br>
depend on clangStaticAnalyzer. However, if we want
ASTMatchers to <br>
support static analyzer data structures, there should be
a backward <br>
dependency. Now this dependency is header-only.<br>
5. Code duplication. This is mostly fine for a sandbox
but needs to be <br>
reduced.<br>
<br>
From analyzer's side:<br>
1. Many events are not reflected in the final
ExplodedGraph. For <br>
example, checkers can receive PointerEscape event, but
the event itself <br>
will not be recorded in the ExplodedGraph - only changes
made by <br>
checkers will.</blockquote>
</div>
</div>
</div>
</blockquote>
<br>
Weeeell maaaaaybe. This would essentially mean passing a
CheckerContext into checkPointerEscape and checkRegionChanges.
With a predecessor node having a new sort of ProgramPoint such as
"PointerEscapePoint". And we can put the list of escaped stuff
directly into the program point instead of passing it separately.<br>
<br>
Which means allowing state splits in these callbacks. Which of
course can be disallowed later (to keep the code that calls the
callbacks simple), and graph matchers won't need them any time
soon anyway.<br>
<br>
But it'd still be a lot of refactoring because currently these
callbacks are called from code that doesn't even know anything
about nodes, eg. from within State->invalidateRegions().<br>
</blockquote>
Internally, I created a dummy node in CheckerManager methods and
then launched checker callbacks with it. But this doesn't look like
a right way.<br>
<br>
<blockquote type="cite"
cite="mid:1f0beb2b-0e33-bd7d-f80d-3a23f8c4ed88@gmail.com"> <br>
<blockquote type="cite"
cite="mid:CAPRL4a0WncdqW394JFDZ7SdA0+bF0jgFrH7-wnfsN03mFADH9g@mail.gmail.com">
<div dir="ltr">
<div>
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">That's
also true for Stmt nodes: I noticed same issues <br>
with PostCondition. This makes matching a bit harder.
Should we add some <br>
information into ExplodedGraph?<br>
2. (Minor) Some static analyzer data structures don't
support <br>
traditional LLVM casting methods (dyn_cast, isa) because
they lack <br>
classof() method. I have fixed this internally and will
put a patch on <br>
review soon.<br>
<br>
-------- Conclusion --------<br>
Finally, there is a graph matching engine supporting
basic functionality <br>
and two checkers using it. I apologize for not starting
the discussion <br>
earlier, but I just wasn't sure that the approach will
work. Is anybody <br>
interested to have this stuff in upstream (maybe,
changed or improved)? <br>
If yes, I'll be happy to contribute this work
patch-by-patch and <br>
continue its improvement. If no - well, I had enough fun
playing with it.<br>
</blockquote>
<div><br>
</div>
<div>Thanks for sharing this results. Regardless of being
upstreamed as is or</div>
<div>in a separate form or not at all, this is an
interesting experiment for the</div>
<div>community. <br>
</div>
<div> </div>
</div>
</div>
</div>
</blockquote>
<br>
</blockquote>
<p><br>
</p>
</body>
</html>