[clang] 2b49484 - Add a clang-transformer tutorial

Yitzhak Mandelbaum via cfe-commits cfe-commits at lists.llvm.org
Wed Nov 17 05:41:23 PST 2021


Author: Yitzhak Mandelbaum
Date: 2021-11-17T13:40:46Z
New Revision: 2b4948448f03104b4b957860dd8c019d0b9df2f0

URL: https://github.com/llvm/llvm-project/commit/2b4948448f03104b4b957860dd8c019d0b9df2f0
DIFF: https://github.com/llvm/llvm-project/commit/2b4948448f03104b4b957860dd8c019d0b9df2f0.diff

LOG: Add a clang-transformer tutorial

Differential Revision: https://reviews.llvm.org/D114011

Added: 
    clang/docs/ClangTransformerTutorial.rst

Modified: 
    clang/docs/index.rst

Removed: 
    


################################################################################
diff  --git a/clang/docs/ClangTransformerTutorial.rst b/clang/docs/ClangTransformerTutorial.rst
new file mode 100644
index 000000000000..33931ad201a5
--- /dev/null
+++ b/clang/docs/ClangTransformerTutorial.rst
@@ -0,0 +1,400 @@
+==========================
+Clang Transformer Tutorial
+==========================
+
+A tutorial on how to write a source-to-source translation tool using Clang Transformer.
+
+.. contents::
+   :local:
+
+What is Clang Transformer?
+--------------------------
+
+Clang Transformer is a framework for writing C++ diagnostics and program
+transformations. It is built on the clang toolchain and the LibTooling library,
+but aims to hide much of the complexity of clang's native, low-level libraries.
+
+The core abstraction of Transformer is the *rewrite rule*, which specifies how
+to change a given program pattern into a new form. Here are some examples of
+tasks you can achieve with Transformer:
+
+*   warn against using the name ``MkX`` for a declared function,
+*   change ``MkX`` to ``MakeX``, where ``MkX`` is the name of a declared function,
+*   change ``s.size()`` to ``Size(s)``, where ``s`` is a ``string``,
+*   collapse ``e.child().m()`` to ``e.m()``, for any expression ``e`` and method named
+    ``m``.
+
+All of the examples have a common form: they identify a pattern that is the
+target of the transformation, they specify an *edit* to the code identified by
+the pattern, and their pattern and edit refer to common variables, like ``s``,
+``e``, and ``m``, that range over code fragments. Our first and second examples also
+specify constraints on the pattern that aren't apparent from the syntax alone,
+like "``s`` is a ``string``." Even the first example ("warn ...") shares this form,
+even though it doesn't change any of the code -- it's "edit" is simply a no-op.
+
+Transformer helps users succinctly specify rules of this sort and easily execute
+them locally over a collection of files, apply them to selected portions of
+a codebase, or even bundle them as a clang-tidy check for ongoing application.
+
+Who is Clang Transformer for?
+-----------------------------
+
+Clang Transformer is for developers who want to write clang-tidy checks or write
+tools to modify a large number of C++ files in (roughly) the same way. What
+qualifies as "large" really depends on the nature of the change and your
+patience for repetitive editing. In our experience, automated solutions become
+worthwhile somewhere between 100 and 500 files.
+
+Getting Started
+---------------
+
+Patterns in Transformer are expressed with :doc:`clang's AST matchers <LibASTMatchers>`. 
+Matchers are a language of combinators for describing portions of a clang
+Abstract Syntax Tree (AST). Since clang's AST includes complete type information
+(within the limits of single `Translation Unit (TU)`_,
+these patterns can even encode rich constraints on the type properties of AST
+nodes.
+
+.. _`Translation Unit (TU)`: https://en.wikipedia.org/wiki/Translation_unit_\(programming\)
+
+We assume a familiarity with the clang AST and the corresponding AST matchers
+for the purpose of this tutorial. Users who are unfamiliar with either are
+encouraged to start with the recommended references in `Related Reading`_.
+
+Example: style-checking names
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Assume you have a style-guide rule which forbids functions from being named
+"MkX" and you want to write a check that catches any violations of this rule. We
+can express this a Transformer rewrite rule:
+
+.. code-block:: c++
+		
+   makeRule(functionDecl(hasName("MkX").bind("fun"),
+	    noopEdit(node("fun")),
+	    cat("The name ``MkX`` is not allowed for functions; please rename"));
+
+``makeRule`` is our go-to function for generating rewrite rules. It takes three
+arguments: the pattern, the edit, and (optionally) an explanatory note. In our
+example, the pattern (``functionDecl(...)``) identifies the declaration of the
+function ``MkX``. Since we're just diagnosing the problem, but not suggesting a
+fix, our edit is an no-op. But, it contains an *anchor* for the diagnostic
+message: ``node("fun")`` says to associate the message with the source range of
+the AST node bound to "fun"; in this case, the ill-named function declaration.
+Finally, we use ``cat`` to build a message that explains the change. Regarding the
+name ``cat`` -- we'll discuss it in more detail below, but suffice it to say that
+it can also take multiple arguments and concatenate their results.
+
+Note that the result of ``makeRule`` is a value of type
+``clang::transformer::RewriteRule``, but most users don't need to care about the
+details of this type.
+
+Example: renaming a function
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Now, let's extend this example to a *transformation*; specifically, the second
+example above:
+
+.. code-block:: c++
+		
+   makeRule(declRefExpr(to(functionDecl(hasName("MkX")))),
+	    changeTo(cat("MakeX")),
+	    cat("MkX has been renamed MakeX"));
+
+In this example, the pattern (``declRefExpr(...)``) identifies any *reference* to
+the function ``MkX``, rather than the declaration itself, as in our previous
+example. Our edit (``changeTo(...)``) says to *change* the code matched by the
+pattern *to* the text "MakeX". Finally, we use ``cat`` again to build a message
+that explains the change.
+
+Here are some example changes that this rule would make:
+
++--------------------------+----------------------------+
+| Original                 | Result                     |
++==========================+============================+
+| ``X x = MkX(3);``        | ``X x = MakeX(3);``        |
++--------------------------+----------------------------+
+| ``CallFactory(MkX, 3);`` | ``CallFactory(MakeX, 3);`` |
++--------------------------+----------------------------+
+| ``auto f = MkX;``        | ``auto f = MakeX;``        |
++--------------------------+----------------------------+
+
+Example: method to function
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Next, let's write a rule to replace a method call with a (free) function call,
+applied to the original method call's target object. Specifically, "change
+``s.size()`` to ``Size(s)``, where ``s`` is a ``string``." We start with a simpler
+change that ignores the type of ``s``. That is, it will modify *any* method call
+where the method is named "size":
+
+.. code-block:: c++
+		
+   llvm::StringRef s = "str";
+   makeRule(
+     cxxMemberCallExpr(
+       on(expr().bind(s)),
+       callee(cxxMethodDecl(hasName("size")))),
+     changeTo(cat("Size(", node(s), ")")),
+     cat("Method ``size`` is deprecated in favor of free function ``Size``"));
+
+We express the pattern with the given AST matcher, which binds the method call's
+target to ``s`` [#f1]_. For the edit, we again use ``changeTo``, but this
+time we construct the term from multiple parts, which we compose with ``cat``. The
+second part of our term is ``node(s)``, which selects the source code
+corresponding to the AST node ``s`` that was bound when a match was found in the
+AST for our rule's pattern. ``node(s)`` constructs a ``RangeSelector``, which, when
+used in ``cat``, indicates that the selected source should be inserted in the
+output at that point.
+
+Now, we probably don't want to rewrite *all* invocations of "size" methods, just
+those on ``std::string``\ s. We can achieve this change simply by refining our
+matcher. The rest of the rule remains unchanged:
+
+.. code-block:: c++
+		
+   llvm::StringRef s = "str";
+   makeRule(
+     cxxMemberCallExpr(
+       on(expr(hasType(namedDecl(hasName("std::string"))))
+	 .bind(s)),
+       callee(cxxMethodDecl(hasName("size")))),
+     changeTo(cat("Size(", node(s), ")")),
+     cat("Method ``size`` is deprecated in favor of free function ``Size``"));
+
+Example: rewriting method calls
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+In this example, we delete an "intermediary" method call in a string of
+invocations. This scenario can arise, for example, if you want to collapse a
+substructure into its parent.
+
+.. code-block:: c++
+		
+   llvm::StringRef e = "expr", m = "member";
+   auto child_call = cxxMemberCallExpr(on(expr().bind(e)),
+				       callee(cxxMethodDecl(hasName("child"))));
+   makeRule(cxxMemberCallExpr(on(child_call), callee(memberExpr().bind(m)),
+	    changeTo(cat(e, ".", member(m), "()"))),
+	    cat("``child`` accessor is being removed; call ",
+		member(m), " directly on parent"));
+
+This rule isn't quite what we want: it will rewrite ``my_object.child().foo()`` to
+``my_object.foo()``, but it will also rewrite ``my_ptr->child().foo()`` to
+``my_ptr.foo()``, which is not what we intend. We could fix this by restricting
+the pattern with ``not(isArrow())`` in the definition of ``child_call``. Yet, we
+*want* to rewrite calls through pointers.
+
+To capture this idiom, we provide the ``access`` combinator to intelligently
+construct a field/method access. In our example, the member access is expressed
+as:
+
+.. code-block:: c++
+		
+   access(e, cat(member(m)))
+
+The first argument specifies the object being accessed and the second, a
+description of the field/method name. In this case, we specify that the method
+name should be copied from the source -- specifically, the source range of ``m``'s
+member. To construct the method call, we would use this expression in ``cat``:
+
+.. code-block:: c++
+		
+   cat(access(e, cat(member(m))), "()")
+
+Reference: ranges, stencils, edits, rules
+-----------------------------------------
+
+The above examples demonstrate just the basics of rewrite rules. Every element
+we touched on has more available constructors: range selectors, stencils, edits
+and rules. In this section, we'll briefly review each in turn, with references
+to the source headers for up-to-date information. First, though, we clarify what
+rewrite rules are actually rewriting.
+
+Rewriting ASTs to... Text?
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The astute reader may have noticed that we've been somewhat vague in our
+explanation of what the rewrite rules are actually rewriting. We've referred to
+"code", but code can be represented both as raw source text and as an abstract
+syntax tree. So, which one is it?
+
+Ideally, we'd be rewriting the input AST to a new AST, but clang's AST is not
+terribly amenable to this kind of transformation. So, we compromise: we express
+our patterns and the names that they bind in terms of the AST, but our changes
+in terms of source code text. We've designed Transformer's language to bridge
+the gap between the two representations, in an attempt to minimize the user's
+need to reason about source code locations and other, low-level syntactic
+details.
+
+Range Selectors
+^^^^^^^^^^^^^^^
+
+Transformer provides a small API for describing source ranges: the
+``RangeSelector`` combinators. These ranges are most commonly used to specify the
+source code affected by an edit and to extract source code in constructing new
+text.
+
+Roughly, there are two kinds of range combinators: ones that select a source
+range based on the AST, and others that combine existing ranges into new ranges.
+For example, ``node`` selects the range of source spanned by a particular AST
+node, as we've seen, while ``after`` selects the (empty) range located immediately
+after its argument range. So, ``after(node("id"))`` is the empty range immediately
+following the AST node bound to ``id``.
+
+For the full collection of ``RangeSelector``\ s, see the header,
+`clang/Tooling/Transformer/RangeSelector.h <https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Tooling/Transformer/RangeSelector.h>`_
+
+Stencils
+^^^^^^^^
+
+Transformer offers a large and growing collection of combinators for
+constructing output. Above, we demonstrated ``cat``, the core function for
+constructing stencils. It takes a series of arguments, of three possible kinds:
+
+#.  Raw text, to be copied directly to the output.
+#.  Selector: specified with a ``RangeSelector``, indicates a range of source text
+    to copy to the output.
+#.  Builder: an operation that constructs a code snippet from its arguments. For
+    example, the ``access`` function we saw above.
+
+Data of these 
diff erent types are all represented (generically) by a ``Stencil``.
+``cat`` takes text and ``RangeSelector``\ s directly as arguments, rather than
+requiring that they be constructed with a builder; other builders are
+constructed explicitly.
+
+In general, ``Stencil``\ s produce text from a match result. So, they are not
+limited to generating source code, but can also be used to generate diagnostic
+messages that reference (named) elements of the matched code, like we saw in the
+example of rewriting method calls.
+
+Further details of the ``Stencil`` type are documented in the header file
+`clang/Tooling/Transformer/Stencil.h <https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Tooling/Transformer/Stencil.h>`_.
+
+Edits
+^^^^^
+
+Transformer supports additional forms of edits. First, in a ``changeTo``, we can
+specify the particular portion of code to be replaced, using the same
+``RangeSelector`` we saw earlier. For example, we could change the function name
+in a function declaration with:
+
+.. code-block:: c++
+		
+   makeRule(functionDecl(hasName("bad")).bind(f),
+	    changeTo(name(f), cat("good")),
+	    cat("bad is now good"));
+
+We also provide simpler editing primitives for insertion and deletion:
+``insertBefore``, ``insertAfter`` and ``remove``. These can all be found in the header
+file
+`clang/Tooling/Transformer/RewriteRule.h <https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Tooling/Transformer/RewriteRule.h>`_.
+
+We are not limited one edit per match found. Some situations require making
+multiple edits for each match. For example, suppose we wanted to swap two
+arguments of a function call.
+
+For this, we provide an overload of ``makeRule`` that takes a list of edits,
+rather than just a single one. Our example might look like:
+
+.. code-block:: c++
+		
+   makeRule(callExpr(...),
+	   {changeTo(node(arg0), cat(node(arg2))),
+	    changeTo(node(arg2), cat(node(arg0)))},
+	   cat("swap the first and third arguments of the call"));
+
+``EditGenerator``\ s (Advanced)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The particular edits we've seen so far are all instances of the ``ASTEdit`` class,
+or a list of such. But, not all edits can be expressed as ``ASTEdit``\ s. So, we
+also support a very general signature for edit generators:
+
+.. code-block:: c++
+		
+   using EditGenerator = MatchConsumer<llvm::SmallVector<Edit, 1>>;
+
+That is, an ``EditGenerator`` is function that maps a ``MatchResult`` to a set
+of edits, or fails. This signature supports a very general form of computation
+over match results. Transformer provides a number of functions for working with
+``EditGenerator``\ s, most notably
+`flatten <https://github.com/llvm/llvm-project/blob/1fabe6e51917bcd7a1242294069c682fe6dffa45/clang/include/clang/Tooling/Transformer/RewriteRule.h#L165-L167>`_
+``EditGenerator``\ s, like list flattening. For the full list, see the header file
+`clang/Tooling/Transformer/RewriteRule.h <https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Tooling/Transformer/RewriteRule.h>`_.
+
+Rules
+^^^^^
+
+We can also compose multiple *rules*, rather than just edits within a rule,
+using ``applyFirst``: it composes a list of rules as an ordered choice, where
+Transformer applies the first rule whose pattern matches, ignoring others in the
+list that follow. If the matchers are independent then order doesn't matter. In
+that case, ``applyFirst`` is simply joining the set of rules into one.
+
+The benefit of ``applyFirst`` is that, for some problems, it allows the user to
+more concisely formulate later rules in the list, since their patterns need not
+explicitly exclude the earlier patterns of the list. For example, consider a set
+of rules that rewrite compound statements, where one rule handles the case of an
+empty compound statement and the other handles non-empty compound statements.
+With ``applyFirst``, these rules can be expressed compactly as:
+
+.. code-block:: c++
+		
+   applyFirst({
+     makeRule(compoundStmt(statementCountIs(0)).bind("empty"), ...),
+     makeRule(compoundStmt().bind("non-empty"),...)
+   })
+
+The second rule does not need to explicitly specify that the compound statement
+is non-empty -- it follows from the rules position in ``applyFirst``. For more
+complicated examples, this can lead to substantially more readable code.
+
+Sometimes, a modification to the code might require the inclusion of a
+particular header file. To this end, users can modify rules to specify include
+directives with ``addInclude``.
+
+For additional documentation on these functions, see the header file
+`clang/Tooling/Transformer/RewriteRule.h <https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Tooling/Transformer/RewriteRule.h>`_.
+
+Using a RewriteRule as a clang-tidy check
+-----------------------------------------
+
+Transformer supports executing a rewrite rule as a
+`clang-tidy <https://clang.llvm.org/extra/clang-tidy/>`_ check, with the class
+``clang::tidy::utils::TransformerClangTidyCheck``. It is designed to require
+minimal code in the definition. For example, given a rule
+``MyCheckAsRewriteRule``, one can define a tidy check as follows:
+
+.. code-block:: c++
+
+   class MyCheck : public TransformerClangTidyCheck {
+    public:
+     MyCheck(StringRef Name, ClangTidyContext *Context)
+	 : TransformerClangTidyCheck(MyCheckAsRewriteRule, Name, Context) {}
+   };
+
+``TransformerClangTidyCheck`` implements the virtual ``registerMatchers`` and
+``check`` methods based on your rule specification, so you don't need to implement
+them yourself. If the rule needs to be configured based on the language options
+and/or the clang-tidy configuration, it can be expressed as a function taking
+these as parameters and (optionally) returning a ``RewriteRule``. This would be
+useful, for example, for our method-renaming rule, which is parameterized by the
+original name and the target. For details, see
+`clang-tools-extra/clang-tidy/utils/TransformerClangTidyCheck.h <https://github.com/llvm/llvm-project/blob/main/clang-tools-extra/clang-tidy/utils/TransformerClangTidyCheck.h>`_
+
+Related Reading
+---------------
+
+A good place to start understanding the clang AST and its matchers is with the
+introductions on clang's site:
+
+*   :doc:`Introduction to the Clang AST <IntroductionToTheClangAST>`
+*   :doc:`Matching the Clang AST <LibASTMatchers>`
+*   `AST Matcher Reference <https://clang.llvm.org/docs/LibASTMatchersReference.html>`_
+
+.. rubric:: Footnotes
+
+.. [#f1] Technically, it binds it to the string "str", to which our
+    variable ``s`` is bound. But, the choice of that id string is
+    irrelevant, so elide the 
diff erence.

diff  --git a/clang/docs/index.rst b/clang/docs/index.rst
index bf598b1eda03..b7caa6589c3f 100644
--- a/clang/docs/index.rst
+++ b/clang/docs/index.rst
@@ -64,6 +64,7 @@ Using Clang as a Library
    RAVFrontendAction
    LibASTMatchersTutorial
    LibASTMatchers
+   ClangTransformerTutorial
    LibASTImporter
    HowToSetupToolingForLLVM
    JSONCompilationDatabase


        


More information about the cfe-commits mailing list