[PATCH] D129531: [clang][C++20] P0960R3 and P1975R0: Allow initializing aggregates from a parenthesized list of values

Alan Zhao via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon Dec 5 11:58:02 PST 2022


ayzhao added inline comments.


================
Comment at: clang/lib/CodeGen/CGExprAgg.cpp:581
+  Expr *filler = nullptr;
+  if (auto *ILE = dyn_cast<InitListExpr>(ExprToVisit))
+    filler = ILE->getArrayFiller();
----------------
ayzhao wrote:
> ilya-biryukov wrote:
> > ilya-biryukov wrote:
> > > ayzhao wrote:
> > > > ilya-biryukov wrote:
> > > > > - Should we have a filler for `CXXParenInitListExpr` too?
> > > > >   It seems like an important optimization and could have large effect on compile times if we don't 
> > > > > 
> > > > > - Same question for semantic and syntactic-form (similar to `InitListExpr`): should we have it here?
> > > > >   I am not sure if it's semantically required (probably not?), but that's definitely something that `clang-tidy` and other source tools will rely on.
> > > > > 
> > > > > We should probably share the code there, I suggest moving it to a shared base class and using it where appropriate instead of the derived classes.
> > > > > Should we have a filler for CXXParenInitListExpr too? It seems like an important optimization and could have large effect on compile times if we don't
> > > > 
> > > > This feels like premature optimization - presumably, wouldn't this only be an issue with extraordinarily large (say, O(1000)) arrays?
> > > > 
> > > > > Same question for semantic and syntactic-form (similar to InitListExpr): should we have it here? I am not sure if it's semantically required (probably not?), but that's definitely something that clang-tidy and other source tools will rely on
> > > > 
> > > > IIRC this doesn't apply to paren list aggregate expressions, as the syntatic form would be the enclosing `ParenListExpr`.
> > > > This feels like premature optimization - presumably, wouldn't this only be an issue with extraordinarily large (say, O(1000)) arrays?
> > > Yes, this should only happen with large arrays. Normally I would agree, but it's surprising that changing `{}` to `()` in the compiler would lead to performance degradation.
> > > In that sense, this premature optimization is already implemented, we are rather degrading performance for a different syntax to do the same thing.
> > > 
> > > Although we could also land without it, but in that case could you open a GH issue and add a FIXME to track the implementation of this particular optimization?
> > > This should increase the chances of users finding the root cause of the issue if they happen to hit it.
> > > 
> > > > IIRC this doesn't apply to paren list aggregate expressions, as the syntatic form would be the enclosing ParenListExpr.
> > > Do we even have the enclosing `ParenListExpr`? If I dump the AST with `clang -fsyntax-only -Xclang=-ast-dump ...`  for the following code:
> > > ```
> > > struct pair { int a; int b = 2; };
> > > int main() {
> > >   pair(1); pair p(1);
> > >   pair b{1}; pair{1};
> > > }
> > > ```
> > > 
> > > I get 
> > > ```
> > > `-FunctionDecl 0x557d79717e98 <line:2:1, line:5:1> line:2:5 main 'int ()'
> > >   `-CompoundStmt 0x557d797369d0 <col:12, line:5:1>
> > >     |-CXXFunctionalCastExpr 0x557d79718528 <line:3:3, col:9> 'pair':'pair' functional cast to pair <NoOp>
> > >     | `-CXXParenListInitExpr 0x557d79718500 <col:3> 'pair':'pair'
> > >     |   |-IntegerLiteral 0x557d79718010 <col:8> 'int' 1
> > >     |   `-IntegerLiteral 0x557d79717e18 <line:1:30> 'int' 2
> > >     |-DeclStmt 0x557d79718650 <line:3:12, col:21>
> > >     | `-VarDecl 0x557d79718568 <col:12, col:17> col:17 p 'pair':'pair' parenlistinit
> > >     |   `-CXXParenListInitExpr 0x557d79718610 <col:17> 'pair':'pair'
> > >     |     |-IntegerLiteral 0x557d797185d0 <col:19> 'int' 1
> > >     |     `-IntegerLiteral 0x557d79717e18 <line:1:30> 'int' 2
> > >     |-DeclStmt 0x557d797187d8 <line:4:3, col:12>
> > >     | `-VarDecl 0x557d79718680 <col:3, col:11> col:8 b 'pair':'pair' listinit
> > >     |   `-InitListExpr 0x557d79718750 <col:9, col:11> 'pair':'pair'
> > >     |     |-IntegerLiteral 0x557d797186e8 <col:10> 'int' 1
> > >     |     `-CXXDefaultInitExpr 0x557d797187a0 <col:11> 'int'
> > >     `-CXXFunctionalCastExpr 0x557d797369a8 <col:14, col:20> 'pair':'pair' functional cast to pair <NoOp>
> > >       `-InitListExpr 0x557d79718868 <col:18, col:20> 'pair':'pair'
> > >         |-IntegerLiteral 0x557d79718800 <col:19> 'int' 1
> > >         `-CXXDefaultInitExpr 0x557d797188b8 <col:20> 'int'
> > > ```
> > > It feels like the `ParentListExpr` is replaced during semantic analysis and there is no way to get it back. I also tried running `clang-query` and trying to `match parenListExpr()` and go 0 results. Is it just missing in the AST dump and I run `clang-query` incorrectly or do we actually not have the syntactic form of this expression after all?
> > Another important thing to address from the dump: notice how braced initialization creates `CXXDefaultInitExpr` and `CXXParenListInitExpr` copies the default argument expression directly. It's really important to use the former form, here's the example that currently breaks:
> > 
> > 
> > ```
> > #include <iostream>
> > 
> > struct func_init { int some_int; const char* fn = __builtin_FUNCTION(); };
> > 
> > int main() {
> >   func_init a(10);
> >   std::cout << "From paren init:" << a.fn << std::endl;
> > 
> >   func_init b{10};
> >   std::cout << "From braced init: " << b.fn << std::endl;
> > }
> > ```
> > 
> > The program above is expected to report `main` for both cases, but instead we get:
> > ```
> > From paren init:
> > From braced init: main
> > ```
> The following have now been implemented:
> 
> * `CXXDefaultInitExpr` and `ImplicitValueInitExpr`, which includes adding a test for `__builtin_FUNCTION()`
> * Array fillers
> * Semantic forms vs syntactic forms
Instead of implementing separate syntactic and semantic forms, I think it might be simpler and cleaner to store an `unsigned NumUserSpecifiedExprs`.

The reason for implementing it this way is that the purpose of having separate syntactic and semantic forms is to determine what the form in the code was originally as written and what the form is after semantic analysis. For `InitListExpr`, we need separate objects because there can be a significant difference in the two forms (e.g. designated initializers, etc.). However, `CXXParenListInitExpr` is much simpler - the first k expressions are user-specified, while the remaining are default/compiler-generated. Therefore, we can eliminate the complexity required for maintaining two forms (say, in `TreeTransform`) by just storing k. 

N.B. this also resolves a build failure with an earlier diff where the libc++ bots were failing.

There is also precedence for this kind of structure - function calls with default arguments use similar logic instead of creating two `Expr` objects.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D129531/new/

https://reviews.llvm.org/D129531



More information about the cfe-commits mailing list