[PATCH] D116518: [ast-matchers] Add hasSubstatementSequence matcher

Yitzhak Mandelbaum via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Jan 12 10:20:53 PST 2022


ymandel added a comment.

Thanks for looping me in. I'll try to take a detailed look later today. In the meantime, I'll note that we have something similar internally which I never got around to upstreaming. However, we chose to support arbitrarily many matchers, with this interface:

  const clang::ast_matchers::internal::VariadicFunction<
      clang::ast_matchers::internal::Matcher<clang::CompoundStmt>,
      internal::SequenceMatcher<clang::Stmt>,
      internal::hasSubstatementSequenceFunc>
      hasSubstatementSequence = {};

The `SequenceMatcher` API is:

  // The following definitions all support the `hasSubstatementSequence`
  // matcher. This matcher supports describing the series of statements in a
  // compound statement, in a style inspired by regular expressions. Unlike
  // regular expressions, however, these operators are deterministic. Choices are
  // tried in order. For optional-style operators (`maybeOne`, `zeroOrMore` and
  // `oneOrMore`) the positive choice is considered first.
  template <typename T>
  internal::SequenceMatcher<T> exactlyOne(
      clang::ast_matchers::internal::Matcher<T> Matcher) {
    return {std::move(Matcher)};
  }
  
  template <typename T>
  internal::SequenceMatcher<T> maybeOne(
      clang::ast_matchers::internal::Matcher<T> Matcher) {
    return {internal::SequenceElementKind::ZeroOrOne, std::move(Matcher)};
  }
  
  template <typename T>
  internal::SequenceMatcher<T> zeroOrMore(
      clang::ast_matchers::internal::Matcher<T> Matcher) {
    return {internal::SequenceElementKind::ZeroOrMore, std::move(Matcher)};
  }
  
  template <typename T>
  internal::SequenceMatcher<T> oneOrMore(
      clang::ast_matchers::internal::Matcher<T> Matcher) {
    return {internal::SequenceElementKind::OneOrMore, std::move(Matcher)};
  }

I also implemented a non-deterministic version using backtracking. But, that scared me off because of its potential cost (potential, b/c we could use memoization like packrat parsing to avoid the exponential).

That said, my experience indicates that once you're thinking in terms of sequences, you're probably going to find that you want to match over the CFG, rather than the AST directly.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D116518/new/

https://reviews.llvm.org/D116518



More information about the cfe-commits mailing list