[PATCH] D90023: [Syntax] Add iterators over children of syntax trees.

Sam McCall via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon Oct 26 07:54:22 PDT 2020


sammccall added inline comments.


================
Comment at: clang/include/clang/Tooling/Syntax/Tree.h:157-184
+  /// Iterator over children (common base for const/non-const).
+  /// Not invalidated by tree mutations (holds a stable node pointer).
+  template <typename DerivedT, typename NodeT>
+  class child_iterator_base
+      : public llvm::iterator_facade_base<DerivedT, std::forward_iterator_tag,
+                                          NodeT> {
+  protected:
----------------
eduucaldas wrote:
> I think we should only have an iterator through `Node`s, not a general one like this.
> 
> In general `Tree` has *heterogeneous* children:
> For instance, `1+2` yields the syntax tree:
> ```
> BinaryOperatorExpression
> |-  IntegerLiteralExpression
> |-  Leaf
> `- IntegerLiteralExpression
> ```
> 
> Very rarely will syntax trees have all their children of the same kind, so I don't think it makes sense to try an provide an iterator template. That makes sense for lists , because generally they *have* the same kind. But in that case we provide special iterators for lists only.
> 
This does only iterate over nodes, note this class is private.
This is a common pattern when implementing iterators: you need `iterator` and `const_iterator`, they have almost identical implementations but operate on different types (`const T` vs `T`). Thus a private template.

Sometimes it's used directly (`using iterator = iterator_base<Node>`) but that means the template needs to be visible and also leads to worse compiler diagnostics when the canonical type is printed. Inheritance lets us prevent direct use of the base class and give the iterator classes distinct names.

---

That said, in future, I think we should (later) have a facility to iterate over children of a certain type. This would explicitly be a filtering operation, ignoring nodes not of that type.
Providing these typed operations for concrete subclasses of List makes sense, but in clangd outside of specific refactorings we've found generic facilities to often be more useful. When we designed syntax trees, providing a simple and generic data structure that could be used directly was an explicit goal.


================
Comment at: clang/include/clang/Tooling/Syntax/Tree.h:202-219
+  /// child_iterator is not invalidated by mutations.
+  struct child_iterator : child_iterator_base<child_iterator, Node> {
+    using Base::Base;
+  };
+  struct const_child_iterator
+      : child_iterator_base<const_child_iterator, const Node> {
+    using Base::Base;
----------------
eduucaldas wrote:
> TL;DR:
> `child_iterator` -> `ChildIterator`
> `const_child_iterator` -> `ConstChildIterator`
> `children` -> `getChildren`
> 
> I see you followed the naming convention of the ClangAST, which makes loads of sense. 
> 
> In syntax trees we follow the naming conventions of the [conding standard](https://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly) -- just because we want consistency and we had to pick a guideline to follow -- according to that, functions should be verb like and names should be camel case.
> 
> If like us you chose `const_child_iterator` and `children` kind of "just because", could you change it to follow `ConstChildIterator` and `getChildren` respectively. 
> In syntax trees we follow the naming conventions of the conding standard

I see that's been used in some of the newer classes, and that you changed this file to follow that style in 4c14ee61b73746b314d83e7c52e03d6527b78105. However it's not the original style we used here and elsewhere in libSyntax.
It's a bit frustrating to hear the argument about consistency, because we were quite careful and deliberate about this style, which was IMO fairly consistent prior to that change.

I'm willing to change these to match the local style in this file. However `*_iterator`, `*_begin/end/range()` is more common than FooIterator etc in LLVM overall (I think for consistency with `iterator` etc, which is endorsed by the style guide). So I don't think this change is particularly an improvement, even in terms of consistency.


================
Comment at: clang/lib/Tooling/Syntax/Tree.cpp:22-23
   if (auto *T = dyn_cast<syntax::Tree>(N)) {
-    for (const auto *C = T->getFirstChild(); C; C = C->getNextSibling())
-      traverse(C, Visit);
+    for (const syntax::Node &C : T->children())
+      traverse(&C, Visit);
   }
----------------
eduucaldas wrote:
> Hmm... 
> That looks funny, if the user uses a range-based for loop and forgets the `&`, then we would be copying the Node???
> 
> Also in many places we had to get the address to the node. That is not really consistent with how the syntax trees API's were designed, they generally take pointers instead of references. (that said we could obviously reconsider those design choices)
> That looks funny, if the user uses a range-based for loop and forgets the &, then we would be copying the Node???

It doesn't look funny to me - foreach usually iterates over values rather than pointers. This raises a good point - it looks like Node is copyable and shouldn't be. (The default copy operation violates the tree invariants, e.g. the copied node will not be a child of its parent). I'll send a patch...

(That said this is always the case with copyable objects in range-based for loops, and indeed using references - that doesn't usually mean we avoid using them)

> Also in many places we had to get the address to the node. That is not really consistent with how the syntax trees API's were designed, they generally take pointers instead of references

I'm not convinced that taking the address is itself a burden, any more than using `->` instead of `.` is a burden.
Sometimes the use of pointer vs reference intends to convey something about address stability, but here this goes with the type as all nodes are allocated on the arena. (IMO this is a part of the AST design that works well).
The other thing it conveys is nullability, and this seems valuable enough that IMO APIs that take *non-nullable* nodes by pointer should be reconsidered.

For the iterators specifically, the main alternative here I guess is having Tree::children() act as a container of *pointers to* children, instead of the children themselves. This *is* highly unconventional and surprising. Consider the ubiquitous `for (const auto&N : T->children())`... now the author/reader expects N to be a const reference to a child, instead it's a const reference to a **non-const** pointer to a child....


================
Comment at: clang/unittests/Tooling/Syntax/TreeTest.cpp:129
 
+TEST_F(TreeTest, Iterators) {
+  buildTree("", allTestClangConfigs().front());
----------------
eduucaldas wrote:
> Suggestion:
> I would split this into `Iterators` and `ConstIterators`.
Hmm, I'm not a big fan of duplicating all the setup bits. And where would we test that iterators can be converted to an equivalent const_iterator? (This doubles as our implicit test of the multi-pass guarantee, too...)


================
Comment at: clang/unittests/Tooling/Syntax/TreeTest.cpp:159-169
+  for (unsigned I = 0; I < 3; ++I) {
+    EXPECT_EQ(It, CIt);
+    EXPECT_TRUE(It);
+    EXPECT_TRUE(CIt);
+    EXPECT_EQ(It.asPointer(), Children[I]);
+    EXPECT_EQ(CIt.asPointer(), Children[I]);
+    EXPECT_EQ(&*It, Children[I]);
----------------
eduucaldas wrote:
> Is there a reason why you didn't use a range-based for loop over Children?
I want to be really explicit about the expectation that this yields three elements (as opposed to say zero).

The ElementsAre tests above should disallow that going wrong, but calling the methods once directly with the dumbest possible code seems worthwhile to me.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D90023/new/

https://reviews.llvm.org/D90023



More information about the cfe-commits mailing list