[PATCH] D90161: [SyntaxTree] Provide iterators for Lists

Sam McCall via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon Oct 26 10:02:22 PDT 2020


sammccall added a comment.

This seems like more implementation and API complexity than is justified by the problems it's solving.
I do think it's possible to drastically simplify it (I've left some suggestions here) but if not, I think we should move it to its own header.



================
Comment at: clang/include/clang/Tooling/Syntax/Tree.h:231
+  /// `ElementAndDelimiter` acts as one.
+  template <typename ElementType> class ElementAndDelimiterIterator {
+  public:
----------------
sammccall wrote:
> this name is bulky and hard to read, leads to more unwieldy names later (e.g. `getElementAndDelimiterBeforeBegin` which despite its length is unclear that it's even an iterator).
> 
> Would there be actual confusion in calling this `ElementIterator`, and providing the associated delimiters in addition to the elements?
overall, this design seems to be focused on the size being a single pointer, at the cost of complexity of most of the operations.

Why is this important to optimize for, compared to doing something like:
```
private:
  Node *Elt, *Delim, *Next;
  void advance() {
    // read one element
    // set Elt and/or Delim
    // always advances Next by at least one
  }
public:
  Iterator(Node *Start) : Elt(nullptr), Delim(nullptr), Next(Start) {}
  operator++() { advance(); }
  Leaf *getDelimiter() { return Delim; }
  Node *getElement() { return Elt; }
```

(Performance isn't the biggest concern here, but I'd usually expect two extra pointers on the stack to be preferable to all the branches in the current impl)


================
Comment at: clang/include/clang/Tooling/Syntax/Tree.h:231
+  /// `ElementAndDelimiter` acts as one.
+  template <typename ElementType> class ElementAndDelimiterIterator {
+  public:
----------------
sammccall wrote:
> sammccall wrote:
> > this name is bulky and hard to read, leads to more unwieldy names later (e.g. `getElementAndDelimiterBeforeBegin` which despite its length is unclear that it's even an iterator).
> > 
> > Would there be actual confusion in calling this `ElementIterator`, and providing the associated delimiters in addition to the elements?
> overall, this design seems to be focused on the size being a single pointer, at the cost of complexity of most of the operations.
> 
> Why is this important to optimize for, compared to doing something like:
> ```
> private:
>   Node *Elt, *Delim, *Next;
>   void advance() {
>     // read one element
>     // set Elt and/or Delim
>     // always advances Next by at least one
>   }
> public:
>   Iterator(Node *Start) : Elt(nullptr), Delim(nullptr), Next(Start) {}
>   operator++() { advance(); }
>   Leaf *getDelimiter() { return Delim; }
>   Node *getElement() { return Elt; }
> ```
> 
> (Performance isn't the biggest concern here, but I'd usually expect two extra pointers on the stack to be preferable to all the branches in the current impl)
how sure are we that the template and strong typing is worthwhile on the iterators? Can we try without it for a while and see how painful it is?


================
Comment at: clang/include/clang/Tooling/Syntax/Tree.h:235
+      switch (getKind()) {
+      case IteratorKind::BeforeBegin: {
+        if (auto *Begin = getParent()->getFirstChild())
----------------
Are the before-begin iterators needed immediately?

My understanding is that before-begin iterators are used with nodes due to the single-linked-list implementation, but that's going away. Can we avoid adding more of these by sequencing those changes differently?


================
Comment at: clang/include/clang/Tooling/Syntax/Tree.h:242
+      }
+      case IteratorKind::NotSentinel: {
+        auto *DelimiterOrElement = getDelimiterOrElement();
----------------
NotSentinel -> Element?

Logically, the iterator is pointing at an element in the list. (If the element happens to be missing, fine...)


================
Comment at: clang/include/clang/Tooling/Syntax/Tree.h:272
+
+    // An `operator*` would return an `ElementAndDelimiter`, but it would return
+    // it as a value instead of the expected reference, since this
----------------
If we're not willing for this to be an InputIterator (I'm still not clear *why* not) or a stashing iterator (ditto), then maybe we might as well define `operator*` to refer to `Element` so people only have to use explicit iterators if they care about delimiters?


================
Comment at: clang/include/clang/Tooling/Syntax/Tree.h:285
+      if (ElementOrDelimiter && isElement(ElementOrDelimiter))
+        return cast<ElementType>(ElementOrDelimiter);
+      return nullptr;
----------------
this is a crash if the list contains elements that have a type that doesn't match the container's expected type.

If we really need the strongly-typed version of this for constrained lists, we should define behavior here - probably return nullptr, just like an accessor on Tree for a strongly-typed single child whose type doesn't match what's expected.


================
Comment at: clang/include/clang/Tooling/Syntax/Tree.h:364
+  public:
+    List *getParent() {
+      switch (getKind()) {
----------------
what's the use case for this? What does it return on a default-constructed iterator?


================
Comment at: clang/include/clang/Tooling/Syntax/Tree.h:443
+  getElementsAsNodesAndDelimitersBeforeBegin();
+  ElementAndDelimiterIterator<Node> getElementsAsNodesAndDelimitersBegin();
+  ElementAndDelimiterIterator<Node> getElementsAsNodesAndDelimitersEnd();
----------------
these names are way way too long unless this is is going to be very rarely used (in which case it shouldn't be members at all).
Can we have `iterator_range<...> getDelimitedElements()` or something? (FWIW I'd prefer just `getElements()` and have the vector version be `collectElements()` or so)


================
Comment at: clang/lib/Tooling/Syntax/Tree.cpp:352
+  default:
+    llvm_unreachable("List children can only be elements or delimiters.");
   }
----------------
Let's avoid deliberately crashing on invalid code.

I'd suggest either treating over non-element-non-delimiters as if they don't exist, or treating them as elements by default.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D90161/new/

https://reviews.llvm.org/D90161



More information about the cfe-commits mailing list