[PATCH] D64146: [ConstExprPreter][WIP] Initial patch for the constexpr interpreter

Nandor Licker via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Jul 31 14:11:22 PDT 2019


nand marked 40 inline comments as done.
nand added a comment.

I have applied most of the changes you suggested to my HEAD which had significantly more functionality, including a replacement of Opcodes.in with TableGen.
Quite a few of your concerns are answered by the features I have added between submitting the patch and now. The interpreter now stands at ~6k LOC.

Should I update the diff with an interpreter trimmed down to the functionality of the current diff (~3k LOC) or post the full interpreter now? What would be easier to review?



================
Comment at: clang/lib/AST/ExprVM/Compiler.cpp:88
+      Params.insert({ParamDecl, ParamOffset});
+      ParamOffset += align(Size);
+      Args.push_back(*T);
----------------
rsmith wrote:
> What happens if this overflows (due to many large local variables)?
Overflow happens here if a function takes more than a few million arguments. I think clang would die before it gets here.


================
Comment at: clang/lib/AST/ExprVM/Compiler.cpp:488-503
+Error Compiler::discard(const Expr *DE) {
+  switch (DE->getStmtClass()) {
+  case Stmt::BinaryOperatorClass:
+    return discardBinaryOperator(static_cast<const BinaryOperator *>(DE));
+  default:
+    if (auto E = Visit(DE))
+      return E;
----------------
rsmith wrote:
> I'm uncomfortable about this: it looks like you're heading towards duplicating a large portion of the bytecode generation depending on whether the result is used. Can we do better somehow? (Eg, in CodeGen, at least for aggregates, we use the same code but track whether the destination is used or not.)
I'd like to do this in a future diff, once we add support for more constructs which discard their results.


================
Comment at: clang/lib/AST/ExprVM/InterpFrame.h:89-90
+  char *Args = nullptr;
+  /// Fixed, initial storage for known local variables.
+  std::unique_ptr<char[]> Locals;
+};
----------------
rsmith wrote:
> Separate allocation for each frame seems wasteful. Have you considered using a slab-allocated stack or similar? (Eg, allocate a large block up front and allocate the locals from that, and allocate more blocks if the first one is exhausted; you can cache the blocks on the `vm::Context` for reuse with no extra round-trips to the heap.)
Yes, I considered allocating this on the stack - I just don't want to add that complexity to the interpreter yet. I like to avoid changing this until it becomes a noticeable performance problem.


================
Comment at: clang/lib/AST/ExprVM/Pointer.h:30
+
+/// Describes a memory block - generated once by the compiler.
+struct Descriptor {
----------------
rsmith wrote:
> How is this going to work for subobjects? We can't generate the full space of all possible descriptors, as that would be unmanageable for large arrays of complicated types.
Subobjects will have pointers to descriptors embedded into their data - a pointer inside a block can follow that chain of descriptors up to the root.


================
Comment at: clang/lib/AST/ExprVM/Pointer.h:36-39
+  /// Size of the object.
+  unsigned Size;
+  /// Size of an element.
+  unsigned ElemSize;
----------------
rsmith wrote:
> Please use a distinct type for representing size-in-the-interpreter, to avoid confusion and bugs when dealing with code that must manage both compile-time sizes and runtime sizes. And please use `CharUnits` for representing sizes-at-runtime.
I've added an alias to be used for the sizes of object as determined by the interpreter - CharUnits will be used when interfacing with APValue.


================
Comment at: clang/lib/AST/ExprVM/Pointer.h:40-49
+  /// Type of the elements.
+  PrimType ElemType;
+  /// Flag indicating if the descriptor is mutable.
+  bool IsMutable = false;
+  /// Flag indicating if the object is an array.
+  bool IsArray = false;
+  /// Flag indicating if the object is a global.
----------------
rsmith wrote:
> Consider packing these 5 members into 4 bytes. 
I'd like to avoid packing stuff for now, makes it harder to change things later and it's not a performance problem yet.


================
Comment at: clang/lib/AST/ExprVM/Program.h:105
+  /// Program bytecode.
+  std::vector<char> Code;
+  /// Mapping from bytecode offsets to source locations.
----------------
rsmith wrote:
> If `CodePtr` is a plain pointer, using a single flat vector here seems dangerous; you'll have invalidation issues if you ever trigger creation of a `Function` while evaluating, which seems like something that will happen in practice, eg:
> 
> ```
> constexpr int f();
> constexpr int g() { return f(); }
> constexpr int f() { return 0; } // or imported from a module at this point
> constexpr int k = g(); // evaluation of this call triggers generation of code for f
> ```
This wouldn't have been a problem, but the compilation of default constructors caused issues. Now the compiler can recursively invoke itself.


================
Comment at: clang/lib/AST/ExprVM/Program.h:106-107
+  std::vector<char> Code;
+  /// Mapping from bytecode offsets to source locations.
+  std::map<unsigned, SourceInfo> Source;
+};
----------------
rsmith wrote:
> Using a (presumably very large) `std::map` for this would presumably be very time- and space-inefficient. Do you expect this to be dense? (Would a flat array of `SourceInfo` work better?)
Replaced this with a dense array + binary search.


================
Comment at: clang/lib/AST/ExprVM/Type.h:51-53
+/// An integer which is as wide as the native pointer size to encode pointers.
+constexpr PrimType ET_NATIVE_PTR =
+    sizeof(size_t) == sizeof(PrimConv<ET_UINT_64>::T) ? ET_UINT_64 : ET_UINT_32;
----------------
rsmith wrote:
> What do you expect to use this for? If you need to encode a pointer, would it make sense to have a distinct primitive type for that?
Removed this type.


================
Comment at: clang/lib/AST/ExprVM/Type.h:64-75
+#define TYPE_SWITCH(Expr, B)                                                   \
+  switch (Expr) {                                                              \
+  case ET_SINT_8:  {using T = PrimConv<ET_SINT_8>::T;  do{B;}while(0); break; }\
+  case ET_UINT_8:  {using T = PrimConv<ET_UINT_8>::T;  do{B;}while(0); break; }\
+  case ET_SINT_16: {using T = PrimConv<ET_SINT_16>::T; do{B;}while(0); break; }\
+  case ET_UINT_16: {using T = PrimConv<ET_UINT_16>::T; do{B;}while(0); break; }\
+  case ET_SINT_32: {using T = PrimConv<ET_SINT_32>::T; do{B;}while(0); break; }\
----------------
rsmith wrote:
> Injecting `T` into this seems too implicit to me. Consider instead accepting the name of a function-like macro and invoking it on the type. Eg:
> 
> ```
> #define RETURN_SIZEOF(T) return sizeof(T);
> TYPE_SWITCH(Type, RETURN_SIZEOF);
> ```
> 
> Also please wrap the expansion in `do { ... } while (0)` to require the user of the macro to provide a semicolon afterwards.
I'd rather have a documented implicit than use nested macros...


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64146/new/

https://reviews.llvm.org/D64146





More information about the cfe-commits mailing list