[all-commits] [llvm/llvm-project] 53e92e: Reland: [clang][test] add testing for the AST matc...

Fri Nov 15 01:51:37 PST 2024

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 53e92e48d0c03a2475e8517dd4c28968d84fc217
      https://github.com/llvm/llvm-project/commit/53e92e48d0c03a2475e8517dd4c28968d84fc217
  Author: Julian Schmidt <git.julian.schmidt at gmail.com>
  Date:   2024-11-15 (Fri, 15 Nov 2024)

  Changed paths:
    M clang/docs/LibASTMatchersReference.html
    M clang/docs/ReleaseNotes.rst
    M clang/docs/doxygen.cfg.in
    M clang/docs/tools/dump_ast_matchers.py
    M clang/include/clang/ASTMatchers/ASTMatchers.h
    M clang/unittests/ASTMatchers/ASTMatchersTest.h
    M clang/unittests/ASTMatchers/CMakeLists.txt
    A clang/utils/generate_ast_matcher_doc_tests.py

  Log Message:
  -----------
  Reland: [clang][test] add testing for the AST matcher reference (#112168)

## Problem Statement
Previously, the examples in the AST matcher reference, which gets
generated by the Doxygen comments in `ASTMatchers.h`, were untested and
best effort.
Some of the matchers had no or wrong examples of how to use the matcher.

## Solution
This patch introduces a simple DSL around Doxygen commands to enable
testing the AST matcher documentation in a way that should be relatively
easy to use.
In `ASTMatchers.h`, most matchers are documented with a Doxygen comment.
Most of these also have a code example that aims to show what the
matcher will match, given a matcher somewhere in the documentation text.
The way that the documentation is tested, is by using Doxygen's alias
feature to declare custom aliases. These aliases forward to
`<tt>text</tt>` (which is what Doxygen's `\c` does, but for multiple
words). Using the Doxygen aliases is the obvious choice, because there
are (now) four consumers:
 - people reading the header/using signature help
 - the Doxygen generated documentation
 - the generated HTML AST matcher reference
 - (new) the generated matcher tests

This patch rewrites/extends the documentation such that all matchers
have a documented example.
The new `generate_ast_matcher_doc_tests.py` script will warn on any
undocumented matchers (but not on matchers without a Doxygen comment)
and provides diagnostics and statistics about the matchers.

The current statistics emitted by the parser are:

```text
Statistics:
        doxygen_blocks                :   519
        missing_tests                 :    10
        skipped_objc                  :    42
        code_snippets                 :   503
        matches                       :   820
        matchers                      :   580
        tested_matchers               :   574
        none_type_matchers            :     6
```

The tests are generated during building, and the script will only print
something if it found an issue with the specified tests (e.g., missing
tests).

## Description

DSL for generating the tests from documentation.

TLDR:
```
  \header{a.h}
  \endheader     <- zero or more header

  \code
    int a = 42;
  \endcode
  \compile_args{-std=c++,c23-or-later} <- optional, the std flag supports std ranges and
                                          whole languages

  \matcher{expr()} <- one or more matchers in succession
  \match{42}   <- one or more matches in succession

  \matcher{varDecl()} <- new matcher resets the context, the above
                         \match will not count for this new
                         matcher(-group)
  \match{int a  = 42} <- only applies to the previous matcher (not to the
                         previous case)
```

The above block can be repeated inside a Doxygen command for multiple
code examples for a single matcher.
The test generation script will only look for these annotations and
ignore anything else like `\c` or the sentences where these annotations
are embedded into: `The matcher \matcher{expr()} matches the number
\match{42}.`.

### Language Grammar
  [] denotes an optional, and <> denotes user-input

```
  compile_args j:= \compile_args{[<compile_arg>;]<compile_arg>}
  matcher_tag_key ::= type
  match_tag_key ::= type || std || count || sub
  matcher_tags ::= [matcher_tag_key=<value>;]matcher_tag_key=<value>
  match_tags ::= [match_tag_key=<value>;]match_tag_key=<value>
  matcher ::= \matcher{[matcher_tags$]<matcher>}
  matchers ::= [matcher] matcher
  match ::= \match{[match_tags$]<match>}
  matches ::= [match] match
  case ::= matchers matches
  cases ::= [case] case
  header-block ::= \header{<name>} <code> \endheader
  code-block ::= \code <code> \endcode
  testcase ::= code-block [compile_args] cases
```

### Language Standard Versions

The 'std' tag and '\compile_args' support specifying a specific language
version, a whole language and all of its versions, and thresholds
(implies ranges). Multiple arguments are passed with a ',' separator.
For a language and version to execute a tested matcher, it has to match
the specified '\compile_args' for the code, and the 'std' tag for the
matcher. Predicates for the 'std' compiler flag are used with
disjunction between languages (e.g. 'c || c++') and conjunction for all
predicates specific to each language (e.g. 'c++11-or-later &&
c++23-or-earlier').

Examples:
 - `c`                                    all available versions of C
 - `c++11`                                only C++11
 - `c++11-or-later`                       C++11 or later
 - `c++11-or-earlier`                     C++11 or earlier
- `c++11-or-later,c++23-or-earlier,c` all of C and C++ between 11 and
                                          23 (inclusive)
 - `c++11-23,c`                             same as above

### Tags

#### `type`:
**Match types** are used to select where the string that is used to
check if a node matches comes from.
Available: `code`, `name`, `typestr`, `typeofstr`. The default is
`code`.

- `code`: Forwards to `tooling::fixit::getText(...)` and should be the
preferred way to show what matches.
- `name`: Casts the match to a `NamedDecl` and returns the result of
`getNameAsString`. Useful when the matched AST node is not easy to spell
out (`code` type), e.g., namespaces or classes with many members.
- `typestr`: Returns the result of `QualType::getAsString` for the type
derived from `Type` (otherwise, if it is derived from `Decl`, recurses
with `Node->getTypeForDecl()`)

**Matcher types** are used to mark matchers as sub-matcher with 'sub' or
as deactivated using 'none'. Testing sub-matcher is not implemented.

#### `count`:
Specifying a 'count=n' on a match will result in a test that requires
that the specified match will be matched n times. Default is 1.

#### `std`:
A match allows specifying if it matches only in specific language
versions. This may be needed when the AST differs between language
versions.

#### `sub`:
The `sub` tag on a `\match` will indicate that the match is for a node
of a bound sub-matcher.
E.g., `\matcher{expr(expr().bind("inner"))}` has a sub-matcher that
binds to `inner`, which is the value for the `sub` tag of the expected
match for the sub-matcher `\match{sub=inner$...}`. Currently,
sub-matchers are not tested in any way.

### What if ...?

#### ... I want to add a matcher?

Add a Doxygen comment to the matcher with a code example, corresponding
matchers and matches, that shows what the matcher is supposed to do.
Specify the compile arguments/supported languages if required, and run
`ninja check-clang-unit` to test the documentation.

#### ... the example I wrote is wrong?

The test-failure output of the generated test file will provide
information about
 - where the generated test file is located
 - which line in `ASTMatcher.h` the example is from
 - which matches were: found, not-(yet)-found, expected
- in case of an unexpected match: what the node looks like using the
different `type`s
- the language version and if the test ran with a windows `-target` flag
(also in failure summary)

#### ... I don't adhere to the required order of the syntax?

The script will diagnose any found issues, such as `matcher is missing
an example` with a `file:line:` prefix,
which should provide enough information about the issue.

#### ... the script diagnoses a false-positive issue with a Doxygen
comment?

It hopefully shouldn't, but if you, e.g., added some non-matcher code
and documented it with Doxygen, then the script will consider that as a
matcher documentation. As a result, the script will print that it
detected a mismatch between the actual and the expected number of
failures. If the diagnostic truly is a false-positive, change the
`expected_failure_statistics` at the top of the
`generate_ast_matcher_doc_tests.py` file.

Fixes #57607
Fixes #63748

To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications