[PATCH] D90161: [SyntaxTree] Provide iterators for Lists

Tue Oct 27 08:57:10 PDT 2020

eduucaldas marked 8 inline comments as done.
eduucaldas added a comment.

I left some points unanswered, I'll answer them tomorrow :)

================
Comment at: clang/include/clang/Tooling/Syntax/Tree.h:231
+  /// `ElementAndDelimiter` acts as one.
+  template <typename ElementType> class ElementAndDelimiterIterator {
+  public:
----------------
sammccall wrote:
> sammccall wrote:
> > sammccall wrote:
> > > this name is bulky and hard to read, leads to more unwieldy names later (e.g. `getElementAndDelimiterBeforeBegin` which despite its length is unclear that it's even an iterator).
> > > 
> > > Would there be actual confusion in calling this `ElementIterator`, and providing the associated delimiters in addition to the elements?
> > overall, this design seems to be focused on the size being a single pointer, at the cost of complexity of most of the operations.
> > 
> > Why is this important to optimize for, compared to doing something like:
> > ```
> > private:
> >   Node *Elt, *Delim, *Next;
> >   void advance() {
> >     // read one element
> >     // set Elt and/or Delim
> >     // always advances Next by at least one
> >   }
> > public:
> >   Iterator(Node *Start) : Elt(nullptr), Delim(nullptr), Next(Start) {}
> >   operator++() { advance(); }
> >   Leaf *getDelimiter() { return Delim; }
> >   Node *getElement() { return Elt; }
> > ```
> > 
> > (Performance isn't the biggest concern here, but I'd usually expect two extra pointers on the stack to be preferable to all the branches in the current impl)
> how sure are we that the template and strong typing is worthwhile on the iterators? Can we try without it for a while and see how painful it is?
> how sure are we that the template and strong typing is worthwhile on the iterators? Can we try without it for a while and see how painful it is?
It avoids duplication of code on derived classes - `CallArguments`, `ParameterDeclarationList`.... I suggest that we try to simplify other things and then rediscuss this point.  

1. BeforeBegin.
2. Trying to fit everything into a pointer. 
3. Templates and strong typing.

================
Comment at: clang/include/clang/Tooling/Syntax/Tree.h:231
+  /// `ElementAndDelimiter` acts as one.
+  template <typename ElementType> class ElementAndDelimiterIterator {
+  public:
----------------
eduucaldas wrote:
> sammccall wrote:
> > sammccall wrote:
> > > sammccall wrote:
> > > > this name is bulky and hard to read, leads to more unwieldy names later (e.g. `getElementAndDelimiterBeforeBegin` which despite its length is unclear that it's even an iterator).
> > > > 
> > > > Would there be actual confusion in calling this `ElementIterator`, and providing the associated delimiters in addition to the elements?
> > > overall, this design seems to be focused on the size being a single pointer, at the cost of complexity of most of the operations.
> > > 
> > > Why is this important to optimize for, compared to doing something like:
> > > ```
> > > private:
> > >   Node *Elt, *Delim, *Next;
> > >   void advance() {
> > >     // read one element
> > >     // set Elt and/or Delim
> > >     // always advances Next by at least one
> > >   }
> > > public:
> > >   Iterator(Node *Start) : Elt(nullptr), Delim(nullptr), Next(Start) {}
> > >   operator++() { advance(); }
> > >   Leaf *getDelimiter() { return Delim; }
> > >   Node *getElement() { return Elt; }
> > > ```
> > > 
> > > (Performance isn't the biggest concern here, but I'd usually expect two extra pointers on the stack to be preferable to all the branches in the current impl)
> > how sure are we that the template and strong typing is worthwhile on the iterators? Can we try without it for a while and see how painful it is?
> > how sure are we that the template and strong typing is worthwhile on the iterators? Can we try without it for a while and see how painful it is?
> It avoids duplication of code on derived classes - `CallArguments`, `ParameterDeclarationList`.... I suggest that we try to simplify other things and then rediscuss this point.  
> 
> 1. BeforeBegin.
> 2. Trying to fit everything into a pointer. 
> 3. Templates and strong typing.
> Would there be actual confusion in calling this ElementIterator, and providing the associated delimiters in addition to the elements?

This is a very interesting proposition! And it brought some interesting ideas. We'll further develop on them and answer to this inline tomorrow :) 

================
Comment at: clang/include/clang/Tooling/Syntax/Tree.h:235
+      switch (getKind()) {
+      case IteratorKind::BeforeBegin: {
+        if (auto *Begin = getParent()->getFirstChild())
----------------
sammccall wrote:
> Are the before-begin iterators needed immediately?
> 
> My understanding is that before-begin iterators are used with nodes due to the single-linked-list implementation, but that's going away. Can we avoid adding more of these by sequencing those changes differently?
https://reviews.llvm.org/D90240

================
Comment at: clang/include/clang/Tooling/Syntax/Tree.h:242
+      }
+      case IteratorKind::NotSentinel: {
+        auto *DelimiterOrElement = getDelimiterOrElement();
----------------
sammccall wrote:
> NotSentinel -> Element?
> 
> Logically, the iterator is pointing at an element in the list. (If the element happens to be missing, fine...)
Agreed

================
Comment at: clang/include/clang/Tooling/Syntax/Tree.h:285
+      if (ElementOrDelimiter && isElement(ElementOrDelimiter))
+        return cast<ElementType>(ElementOrDelimiter);
+      return nullptr;
----------------
sammccall wrote:
> this is a crash if the list contains elements that have a type that doesn't match the container's expected type.
> 
> If we really need the strongly-typed version of this for constrained lists, we should define behavior here - probably return nullptr, just like an accessor on Tree for a strongly-typed single child whose type doesn't match what's expected.
We want children of lists with role `ListElement` to have the expected C++ type.

Here is how an accessor on Tree is implemented.
```
syntax::Expression *syntax::BinaryOperatorExpression::getLhs() {   
  return cast_or_null<syntax::Expression>(                                   
  findChild(syntax::NodeRole::LeftHandSide)); 
}                     
```

Notice that if there is a Node with `LeftHandSide` it *must* be of the type `Expression`.

================
Comment at: clang/include/clang/Tooling/Syntax/Tree.h:364
+  public:
+    List *getParent() {
+      switch (getKind()) {
----------------
sammccall wrote:
> what's the use case for this? What does it return on a default-constructed iterator?
This is useful for the mutations API. 
To remove a range of list elements, you may need to know the termination kind of this list, and thus you need to access the parent list. 
Of course you could do it like `getElement()->getParent()` but that wouldn't work for sentinels and neither for the case of a missing last element on a separated list.
```
"a, b,"    <=> [("a" , ","), ("b" , "," ), (null, null)]
```

I confess I didn't think on default-constructed iterators. I don't see any use for those, however they *are* required for `LegacyForwardIterator` and onwards.

================
Comment at: clang/lib/Tooling/Syntax/Tree.cpp:352
+  default:
+    llvm_unreachable("List children can only be elements or delimiters.");
   }
----------------
sammccall wrote:
> Let's avoid deliberately crashing on invalid code.
> 
> I'd suggest either treating over non-element-non-delimiters as if they don't exist, or treating them as elements by default.
We thought about invalid code.
This `llvm_unreachable` states an invariant on List, that any children can only have one of the two roles.

This would imply that a tree builder - which for invalid code would be the pseudo parser -  would assign `ListDelimiter` for everything that matches the expected delimiter, and try to parse other things into the expected `ElementType`.

Let's examine an example. In:
`f(1, +, 2)`
We have the `List` `CallArguments` for which `ElementType` = `Expression`.
We have some options:
- `+` is a Leaf with role `Unknown`.(which breaks the current invariant)
- `+` is part of an `Expression`, more specifically a `BinaryOperatorExpression` with `lhs` and `rhs` missing.

At the same time we could think of another example:
`f(1, int, 2)`
What would we do in this case?

In conclusion, I think this invariant is a bit constraining. We can consider relaxing it to allow `Unknown` roles. I think in general it would be good for us to have a common understanding of the plans for the pseudo parser.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D90161/new/

https://reviews.llvm.org/D90161