[libcxx-commits] [PATCH] D140584: [Clang] Refactor "Designators" into a unified implementation [NFC]

Richard Smith - zygoloid via Phabricator via libcxx-commits libcxx-commits at lists.llvm.org
Wed Feb 1 17:52:16 PST 2023


rsmith added inline comments.


================
Comment at: clang/include/clang/AST/Designator.h:88
+  /// An array designator, e.g., "[42] = 0" and "[42 ... 50] = 1".
+  template <typename Ty> struct ArrayDesignatorInfo {
+    /// Location of the first and last index expression within the designated
----------------
void wrote:
> rsmith wrote:
> > void wrote:
> > > void wrote:
> > > > rsmith wrote:
> > > > > Can we move the templating out from here to the whole `Designator` and `Designation` classes? It shouldn't be possible to mix the two kinds in the same `Designation`.
> > > > Grr...My previous comment was eaten.
> > > > 
> > > > I'll give it a shot.
> > > > 
> > > > However, I'm a bit surprised at how designators are handled by Clang. I expected that a `Designation` would be an `Expr` with the `Designator`s being L-values (e.g. `MemberExpr`s / `ArraySubscriptExpr`s), but instead the `Designation` exists just long enough to be turned into an explicit initialization list. Is there a reason to do it that way instead of using expressions?
> > > So it looks like moving the template outside of the class won't work. The ability to switch between `Expr` and `unsigned` while retaining the same overall type is hardwired into things like the `ASTImporter`.
> > > 
> > > This is kind of a massive mess. Maybe we shouldn't even allow them to use both `Expr` and `unsigned` but instead require them to use one or the other? Maybe we could require `unsigned` with the understanding that the `Expr` can be converted into a constant?
> > I'm not understanding something. Currently the `ASTImporter` only deals with `DesignatedInitExpr::Designator`s , which only ever store integer indexes.
> > 
> > Basically, today, we have two different classes:
> > - A class that's specific to `DesignatedInitExpr`, and tracks array index expressions by storing the index of the expression within the `DesignatedInitExpr`'s list of children; this is also what `ASTImporter` can import, because it's the one that's used in the AST's representation.
> > - A class that's specific to `Sema`'s processing that tracks array index expressions as `Expr*` instead.
> > 
> > You want to refactor them to share code, which makes sense, because they are basically the same other than how they refer to expressions. (Not quite: `DesignatedInitExpr` can apparently refer to a field either as an `IdentifierInfo*` or as a `FieldDecl*`, whereas the `Sema` version always uses the `IdentifierInfo*` representation.)
> > 
> > Each current user of one of these two classes uses only one of the two, which means they're either exclusively using integers to refer to expressions or exclusively using `Expr*`. So it seems to me that you should be able to update each user to use either `Designator<unsigned>` or `Designator<Expr*>`, depending on which class they used before.
> > 
> > What am I missing?
> I'm still allowing them to use a `Designator<unsigned>` / `Designator<Expr*>` as they see fit, only it's hidden from them via the `Create` methods. I personally find the use of two different versions (one using `unsigned` and one using `Expr*`) completely baffling. Why can't they all use `Expr*`? Also the `ASTImporter` only outputs the start of an array init range, which is at the very least counter-intuitive. That's one of the issues I'd like to tackle with follow-up patches, hopefully getting rid of the need for this template all together. This does mean that in the interim a non-array range designator will have extra `End` & `EllisisLoc` fields that aren't used, but that shouldn't be too horrible, given that they'd be there anyway because of the union.
The reason that's jumping out at me for having separate integer / `Expr*` implementations here is space-efficiency -- we get to make array range designators (and hence designators overall) be only 16 bytes rather than the 32 bytes they occupy in this patch (assuming 64-bit pointers) by storing indexes instead of pointers.

If your eventual plan is to remove the children list from `DesignatedInitExpr`, and store the pointers only in the designators, that seems to cost 8 bytes per designator in the two common cases:

- For a field designator: 32 bytes (with 16 bytes of padding) versus 16 bytes + 8 bytes for the child pointer today
- For an array designator: 32 bytes (with 16 bytes of padding) versus 16 bytes + 8 bytes for the child pointer today
- For an array range designator: 32 bytes (4 bytes of padding) versus 16 bytes + 16 bytes for two child pointers today

... plus it'll presumably be painful to make the `Stmt` child iterator be able to handle this.

If you don't remove the separate children list from `DesignatedInitExpr`, then it seems like this approach will cost 16 bytes per designator in all cases, and we'll need to be careful in AST serialization / deserialization that we don't accidentally duplicate the `Expr`s that now have two pointers pointing to them instead of one, and likewise anywhere else that assumes each `Expr` is only reachable by one path through the AST (eg, `TreeTransform`, the recursive AST visitor).

I think some more visibility into the eventual plan would help.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140584/new/

https://reviews.llvm.org/D140584



More information about the libcxx-commits mailing list