[cfe-dev] How to write a matcher?

papin_g guillaume.papin at epitech.eu
Tue Mar 11 04:35:06 PDT 2014


I have found the AST Matcher Reference to be useful while developing matchers: http://clang.llvm.org/docs/LibASTMatchersReference.html


Instead of writing your own debug matcher you may want to try clang-query.

I don't think it's documented yet but it's quite straightforward to get started. You will have an interactive shell session to build your matcher and see the results in live, I tried it the other day and I suggest to use it with libedit support so it will complete for you the possible matchers as you type.


https://llvm.org/viewvc/llvm-project/clang-tools-extra/trunk/clang-query/​


To generate the AST dump you may want to use clang-check instead of clang++ with all the arguments. clang-check allows you to filter the AST and remove the need of specifying things like -fsyntax-only.


I have found the ast-dump useful since it provides the names of the types that you can look for in the AST Matcher Reference.


________________________________
From: Stephen Kelly <steveire at gmail.com>
Sent: 11 March 2014 10:48
To: cfe-dev at cs.uiuc.edu
Subject: [cfe-dev] How to write a matcher?


Hi,

I want to write a tool to remove c_str() calls in lines like

 std::cout << s.c_str();

There isn't really enough information about what matchers mean, and how to
use them.

There is no good information on how to write/debug them, and how they need
to be arranged (nested, in an allof, wrapped with to(), on(), callee() etc).

How does one *actually* go about doing it?

Here's my attempt:

My test code is:

 int main(int, char **)
 {
   std::string s("something");
   std::cout << s << std::endl;
   std::cout << s.c_str() << std::endl;
   return 0;
 }

I used

 clang++ -Xclang -ast-dump -fsyntax-only -fno-color-diagnostics ../main.cpp

to generate

 `-FunctionDecl 0x275cae0 <../main.cpp:9:1, line:16:1> main 'int (int, char
**)'
   |-ParmVarDecl 0x275c9a0 <line:9:10> 'int'
   |-ParmVarDecl 0x275ca10 <col:15, col:21> 'char **'
   `-CompoundStmt 0x275f150 <line:10:1, line:16:1>
     |-DeclStmt 0x275d160 <line:11:3, col:23>
     | `-VarDecl 0x275cc30 <col:3, col:22> s 'std::string':'class
std::basic_string<char>'
     |   `-ExprWithCleanups 0x275d148 <col:15, col:22> 'std::string':'class
std::basic_string<char>'
     |     `-CXXConstructExpr 0x275d108 <col:15, col:22>
'std::string':'class std::basic_string<char>' 'void (const char *, const
class std::allocator<char> &)'
     |       |-ImplicitCastExpr 0x275cd20 <col:17> 'const char *'
<ArrayToPointerDecay>
     |       | `-StringLiteral 0x275cc88 <col:17> 'const char [4]' lvalue
"wtf"
     |       `-CXXDefaultArgExpr 0x275d0e0 <<invalid sloc>> 'const class
std::allocator<char>':'const class std::allocator<char>' lvalue
     |-CXXOperatorCallExpr 0x275e160 <line:12:3, col:26>
'__ostream_type':'class std::basic_ostream<char>' lvalue
     | |-ImplicitCastExpr 0x275e148 <col:18> '__ostream_type &(*)
(__ostream_type &(*)(__ostream_type &))' <FunctionToPointerDecay>
     | | `-DeclRefExpr 0x275e0c0 <col:18> '__ostream_type &(__ostream_type
&(*)(__ostream_type &))' lvalue CXXMethod 0x26d0410 'operator<<'
'__ostream_type &(__ostream_type &(*)(__ostream_type &))'
     | |-CXXOperatorCallExpr 0x275d5c0 <col:3, col:16> 'basic_ostream<char,
struct std::char_traits<char> >':'class std::basic_ostream<char>' lvalue
     | | |-ImplicitCastExpr 0x275d5a8 <col:13> 'basic_ostream<char, struct
std::char_traits<char> > &(*)(basic_ostream<char, struct
std::char_traits<char> > &, const basic_string<char, struct
std::char_traits<char>, class std::allocator<char> > &)'
<FunctionToPointerDecay>
     | | | `-DeclRefExpr 0x275d528 <col:13> 'basic_ostream<char, struct
std::char_traits<char> > &(basic_ostream<char, struct std::char_traits<char>
> &, const basic_string<char, struct std::char_traits<char>, class
std::allocator<char> > &)' lvalue Function 0x2533250 'operator<<'
'basic_ostream<char, struct std::char_traits<char> > &(basic_ostream<char,
struct std::char_traits<char> > &, const basic_string<char, struct
std::char_traits<char>, class std::allocator<char> > &)'
     | | |-DeclRefExpr 0x275d198 <col:3, col:8> 'ostream':'class
std::basic_ostream<char>' lvalue Var 0x275c3a0 'cout' 'ostream':'class
std::basic_ostream<char>'
     | | `-ImplicitCastExpr 0x275d510 <col:16> 'const basic_string<char,
struct std::char_traits<char>, class std::allocator<char> >':'const class
std::basic_string<char>' lvalue <NoOp>
     | |   `-DeclRefExpr 0x275d1d0 <col:16> 'std::string':'class
std::basic_string<char>' lvalue Var 0x275cc30 's' 'std::string':'class
std::basic_string<char>'
     | `-ImplicitCastExpr 0x275e0a8 <col:21, col:26> 'basic_ostream<char,
struct std::char_traits<char> > &(*)(basic_ostream<char, struct
std::char_traits<char> > &)' <FunctionToPointerDecay>
     |   `-DeclRefExpr 0x275e068 <col:21, col:26> 'basic_ostream<char,
struct std::char_traits<char> > &(basic_ostream<char, struct
std::char_traits<char> > &)' lvalue Function 0x26d4480 'endl'
'basic_ostream<char, struct std::char_traits<char> > &(basic_ostream<char,
struct std::char_traits<char> > &)' (FunctionTemplate 0x26b9180 'endl')
     |-CXXOperatorCallExpr 0x275f0c8 <line:13:3, col:34>
'__ostream_type':'class std::basic_ostream<char>' lvalue
     | |-ImplicitCastExpr 0x275f0b0 <col:26> '__ostream_type &(*)
(__ostream_type &(*)(__ostream_type &))' <FunctionToPointerDecay>
     | | `-DeclRefExpr 0x275f088 <col:26> '__ostream_type &(__ostream_type
&(*)(__ostream_type &))' lvalue CXXMethod 0x26d0410 'operator<<'
'__ostream_type &(__ostream_type &(*)(__ostream_type &))'
     | |-CXXOperatorCallExpr 0x275e610 <col:3, col:24> 'basic_ostream<char,
struct std::char_traits<char> >':'class std::basic_ostream<char>' lvalue
     | | |-ImplicitCastExpr 0x275e5f8 <col:13> 'basic_ostream<char, struct
std::char_traits<char> > &(*)(basic_ostream<char, struct
std::char_traits<char> > &, const char *)' <FunctionToPointerDecay>
     | | | `-DeclRefExpr 0x275e578 <col:13> 'basic_ostream<char, struct
std::char_traits<char> > &(basic_ostream<char, struct std::char_traits<char>
> &, const char *)' lvalue Function 0x26d9830 'operator<<'
'basic_ostream<char, struct std::char_traits<char> > &(basic_ostream<char,
struct std::char_traits<char> > &, const char *)'
     | | |-DeclRefExpr 0x275e1c8 <col:3, col:8> 'ostream':'class
std::basic_ostream<char>' lvalue Var 0x275c3a0 'cout' 'ostream':'class
std::basic_ostream<char>'
     | | `-CXXMemberCallExpr 0x275e258 <col:16, col:24> 'const char *'
     | |   `-MemberExpr 0x275e228 <col:16, col:18> '<bound member function
type>' .c_str 0x252c400
     | |     `-ImplicitCastExpr 0x275e280 <col:16> 'const class
std::basic_string<char>' lvalue <NoOp>
     | |       `-DeclRefExpr 0x275e200 <col:16> 'std::string':'class
std::basic_string<char>' lvalue Var 0x275cc30 's' 'std::string':'class
std::basic_string<char>'
     | `-ImplicitCastExpr 0x275f070 <col:29, col:34> 'basic_ostream<char,
struct std::char_traits<char> > &(*)(basic_ostream<char, struct
std::char_traits<char> > &)' <FunctionToPointerDecay>
     |   `-DeclRefExpr 0x275f030 <col:29, col:34> 'basic_ostream<char,
struct std::char_traits<char> > &(basic_ostream<char, struct
std::char_traits<char> > &)' lvalue Function 0x26d4480 'endl'
'basic_ostream<char, struct std::char_traits<char> > &(basic_ostream<char,
struct std::char_traits<char> > &)' (FunctionTemplate 0x26b9180 'endl')
     `-ReturnStmt 0x275f130 <line:14:3, col:10>
       `-IntegerLiteral 0x275f110 <col:10> 'int' 0


It seems useful to writing a matcher.

I modified the removeCStrCalls tool with a debugging matcher, so that I can
start broad, and incrementally narrow what I want to match, while printing
out what I have actually matched:

 class Debugging : public ast_matchers::MatchFinder::MatchCallback {
 public:
   Debugging(tooling::Replacements *Replace)
     : Replace(Replace) {}

   virtual void run(const ast_matchers::MatchFinder::MatchResult &Result) {

     const Expr *Arg =
         Result.Nodes.getStmtAs<Expr>("match");
     std::string match = getText(*Result.SourceManager, *Arg);
     std::cout << "MATCH " << match << std::endl;

   }

 private:
   tooling::Replacements *Replace;
 };

Is this the right/sane/only possible approach to the act of writing a
matcher?

>From the -ast-dump, it looks like I need to first match a
CXXOperatorCallExpr, so I write this matcher:

  Finder.addMatcher(
      id("match", operatorCallExpr()),
      &Callback);

which gives me lots of output, mostly from the iostream header. I need to
get narrower.

In the -ast-dump output there is a CXXMemberCallExpr nested in the
CXXOperatorCallExpr. I make a logical jump at matching that:

  Finder.addMatcher(
      id("match", operatorCallExpr(memberCallExpr())),
      &Callback);

However, that produces no output, so my logical jump must be incorrect.

What is the correct logical jump here in order to narrow my match?

How can I determine what logical jumps I can make, by looking at the ast-
dump output?

How does one go about *actually* writing matcher code?

What is the understanding one must have, the logic one must have and the
information one must have to hand (is ast-dump output useful or not)?

To be clear (in case it is not already clear), I'm not looking for a
finished solution for the match/replacement I'm trying to make. I'm asking
what to spend time doing between opening my editor and running a tool that
does what I need. How do I figure out what code I need to write?

Note: I have read the docs. The section at

 http://clang.llvm.org/docs/LibASTMatchersTutorial.html#step-2-using-ast-matchers

shows some incremental steps, but does not explain how they are chosen.

How does one know to go from

 forStmt()

to

 forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl()))))

for example? That is the kind of information I am looking for.

Thanks,

Steve.



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20140311/a5f9070d/attachment.html>


More information about the cfe-dev mailing list