[PATCH] D64146: [ConstExprPreter][WIP] Initial patch for the constexpr interpreter
Richard Smith - zygoloid via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Fri Jul 26 17:03:53 PDT 2019
rsmith added a comment.
How do you intend to represent pointers cast to integer types? Allocating 64 bits of state for a 64-bit integer is insufficient to model that case.
================
Comment at: clang/include/clang/AST/ASTContext.h:583-584
+ /// Returns the constexpr VM context.
+ vm::Context &getConstexprCtx();
+
----------------
You have too many names for the same thing (`vm`, `ExprVM`, ConstExprPreter, and `ConstexprCtx` here -- and `ExprVM` is a bad choice of name since a lot of the purpose is to evaluate statements rather than expressions). Please pick a single name and apply it consistently.
Suggestion: use namespace `clang::interp`, in `{include/clang,lib}/AST/Interp`, and generally refer to this implementation as "the Clang interpreter". This function should be called `getInterpContext` or similar. (If you prefer, use VM intead of Interp, but I'd be concerned about confusion with LLVM.)
================
Comment at: clang/include/clang/Basic/DiagnosticASTKinds.td:231-234
+def err_constexpr_compilation_failed : Error<
+ "constexpr cannot be compiled by the experimental interpreter">;
+def err_constexpr_interp_failed : Error<
+ "constexpr cannot be evaluated by the experimental interpreter">;
----------------
`constexpr` is an adjective; this sentence doesn't parse. Do you mean "this constexpr function" or something like that?
================
Comment at: clang/include/clang/Driver/Options.td:838-839
+ HelpText<"Enable the experimental constexpr interpreter">, Flags<[CC1Option]>;
+def fexperimental_constexpr_force_interpreter : Flag<["-"], "fexperimental-constexpr-force-interpreter">, Group<f_Group>,
+ HelpText<"Force the use of the experimental constexpr interpreter, failing on missing features">, Flags<[CC1Option]>;
def fconstexpr_backtrace_limit_EQ : Joined<["-"], "fconstexpr-backtrace-limit=">,
----------------
If the name we're presenting to users is "the experimental constexpr interpreter", then "force" shouldn't appear part way through that name. `-fforce-experimental-constexpr-interpreter` maybe?
================
Comment at: clang/lib/AST/CMakeLists.txt:85
LINK_LIBS
+ clangExpr
clangBasic
----------------
What is the `clangExpr` library? Should this say `clangExprVM` (modulo renaming)?
================
Comment at: clang/lib/AST/ExprConstant.cpp:691-696
+ /// Force the use of the constexpr interpreter, bailing out with an error
+ /// if a feature is unsupported.
+ bool ForceInterp;
+
+ /// Enable the constexpr interpreter.
+ bool EnableInterp;
----------------
These comments don't make sense: `ExprConstant.cpp` is a constexpr interpreter too. Maybe "the new interpreter"?
================
Comment at: clang/lib/AST/ExprVM/CMakeLists.txt:4
+
+add_clang_library(clangExpr
+ Compiler.cpp
----------------
This is not a reasonable name for this library.
================
Comment at: clang/lib/AST/ExprVM/Compiler.cpp:79-82
+ for (auto *ParamDecl : F->parameters()) {
+ if (auto T = Ctx.classify(ParamDecl->getType())) {
+ auto Size = primSize(*T);
+ auto Desc = llvm::make_unique<Descriptor>(ParamDecl, Size, Size, *T,
----------------
You are overusing `auto` here. The types of at least `T` and `Size` are non-obvious, and should be written explicitly.
================
Comment at: clang/lib/AST/ExprVM/Compiler.cpp:88
+ Params.insert({ParamDecl, ParamOffset});
+ ParamOffset += align(Size);
+ Args.push_back(*T);
----------------
What happens if this overflows (due to many large local variables)?
================
Comment at: clang/lib/AST/ExprVM/Compiler.cpp:99
+ Start = P.getOffset();
+ if (auto E = compile(F->getBody()))
+ return std::move(E);
----------------
Don't use `auto` here. The code would be much clearer if you explicitly write `llvm::Error`.
(I'm not going to call out further overuse of `auto`; generally [the rule we want to follow](http://llvm.org/docs/CodingStandards.html#use-auto-type-deduction-to-make-code-more-readable) is "Use auto if and only if it makes the code more readable or easier to maintain.", which many of the uses of `auto` here do not.)
================
Comment at: clang/lib/AST/ExprVM/Compiler.cpp:144
+ } else {
+ return discard(static_cast<const Expr *>(S));
+ }
----------------
Use `E` here rather than casting `S` again.
================
Comment at: clang/lib/AST/ExprVM/Compiler.cpp:168-169
+ if (!VD->hasLocalStorage()) {
+ // TODO: implement non-local variables.
+ return bail(DS);
+ }
----------------
Non-local variables should not require any code generation, since they can't have dynamic initialization or destruction. (You can just return `Error::success()`.)
================
Comment at: clang/lib/AST/ExprVM/Compiler.cpp:321-339
+ if (auto T = Ctx.classify(CE->getType())) {
+ if (auto *DE = dyn_cast<DeclRefExpr>(SubExpr)) {
+ bool IsReference = DE->getDecl()->getType()->isReferenceType();
+ if (!IsReference) {
+ if (auto *PD = dyn_cast<ParmVarDecl>(DE->getDecl())) {
+ return withParam(PD, [this, T, DE](unsigned Idx) -> Error {
+ emitGET_PARAM(*T, Idx, DE);
----------------
I assume this is temporary, until you have a proper representation of lvalues? Please add a FIXME.
================
Comment at: clang/lib/AST/ExprVM/Compiler.cpp:348
+ }
+ case CK_IntegralToBoolean: {
+ if (auto T = Ctx.classify(SubExpr->getType())) {
----------------
Newlines between cases would aid readability here.
================
Comment at: clang/lib/AST/ExprVM/Compiler.cpp:374-405
+ case ET_SINT_8: {
+ emitOp<ET_SINT_8>(EO_CONST_SINT_8, E->getValue().getSExtValue(), E);
+ return Error::success();
+ }
+ case ET_UINT_8: {
+ emitOp<ET_UINT_8>(EO_CONST_UINT_8, E->getValue().getZExtValue(), E);
+ return Error::success();
----------------
Do we really need different representations for all of these? This'll be a pain if we ever (eg) add back int128 literals, or support a target with a different integer size.
If we do want these, can you replace this with an xmacro or similar to generate them all rather than duplicating the case 8 times (presumably 8 times for each integer operation)?
================
Comment at: clang/lib/AST/ExprVM/Compiler.cpp:488-503
+Error Compiler::discard(const Expr *DE) {
+ switch (DE->getStmtClass()) {
+ case Stmt::BinaryOperatorClass:
+ return discardBinaryOperator(static_cast<const BinaryOperator *>(DE));
+ default:
+ if (auto E = Visit(DE))
+ return E;
----------------
I'm uncomfortable about this: it looks like you're heading towards duplicating a large portion of the bytecode generation depending on whether the result is used. Can we do better somehow? (Eg, in CodeGen, at least for aggregates, we use the same code but track whether the destination is used or not.)
================
Comment at: clang/lib/AST/ExprVM/Compiler.cpp:564
+Error Compiler::withParam(const ParmVarDecl *P,
+ std::function<Error(unsigned)> &&f) {
+ auto It = Params.find(P);
----------------
Use `llvm::function_ref`.
================
Comment at: clang/lib/AST/ExprVM/Compiler.cpp:628-629
+ P.emitBytes(reinterpret_cast<const char *>(&Op), sizeof(Op));
+ (void)std::initializer_list<int>{
+ (P.emitBytes(reinterpret_cast<const char *>(&Args), sizeof(Args)), 0)...};
+}
----------------
This trick isn't all that well known; please add a comment that you're using an initializer list to enforce left-to-right evaluation order.
================
Comment at: clang/lib/AST/ExprVM/Compiler.h:56
+/// Compilation context for a constexpr function.
+class Compiler : public ConstStmtVisitor<Compiler, llvm::Error> {
+private:
----------------
This is not a great choice of name for a class within one corner of a compiler. Something like `ByteCodeGenerator` would more clearly convey the purpose of this class.
================
Comment at: clang/lib/AST/ExprVM/Context.h:36-41
+ /// Interpreter successfully computed a value.
+ SUCCESS,
+ /// Interpreter encountered an error and quit.
+ FAIL,
+ /// Interpreter encountered an unimplemented feature, AST fallback.
+ BAIL,
----------------
Rename these to follow our coding conventions: enumerators should be in `UpperCamelCase`.
================
Comment at: clang/lib/AST/ExprVM/Frame.h:29-35
+ virtual void describe(llvm::raw_ostream &OS) = 0;
+
+ virtual Frame *getCaller() const = 0;
+
+ virtual SourceLocation getLoc() const = 0;
+
+ virtual const FunctionDecl *getCallee() const = 0;
----------------
Add doc comments, please; it's not obvious what the contracts of at least `describe` and `getLoc` are.
================
Comment at: clang/lib/AST/ExprVM/Interp.cpp:60
+
+namespace {
+
----------------
We generally don't like big anonymous namespace blocks and prefer marking each function `static` individually, as it improves local readability.
================
Comment at: clang/lib/AST/ExprVM/Interp.cpp:132-142
+template <typename T> bool AddFn(T a, T b, T *r) {
+ return __builtin_add_overflow(a, b, r);
+}
+
+template <typename T> bool SubFn(T a, T b, T *r) {
+ return __builtin_sub_overflow(a, b, r);
+}
----------------
We support host compilers (eg MSVC) where these builtins don't exist.
================
Comment at: clang/lib/AST/ExprVM/Interp.cpp:245
+
+template <typename T> bool MINUS(InterpState &S, CodePtr OpPC) {
+ const auto &Arg = S.Stk.pop<T>();
----------------
Please follow LLVM coding conventions and do not use `SHOUTY_CAPS` for function names.
================
Comment at: clang/lib/AST/ExprVM/InterpFrame.h:89-90
+ char *Args = nullptr;
+ /// Fixed, initial storage for known local variables.
+ std::unique_ptr<char[]> Locals;
+};
----------------
Separate allocation for each frame seems wasteful. Have you considered using a slab-allocated stack or similar? (Eg, allocate a large block up front and allocate the locals from that, and allocate more blocks if the first one is exhausted; you can cache the blocks on the `vm::Context` for reuse with no extra round-trips to the heap.)
================
Comment at: clang/lib/AST/ExprVM/Opcodes.inc:1
+//===--- Opcodes.inc - Opcode defitions for the constexpr VM ----*- C++ -*-===//
+//
----------------
This file is hardcoding too much of its `#include`rs' implementation details. Please find a way to improve that. Generally we prefer
```
enum Opcode : uint32_t {
#define OPCODE(Name) EO_ ## Name,
#include "Opcodes.inc"
};
```
with this in `Opcodes.inc`:
```
OPCODE(Jmp)
OPCODE(JT)
OPCODE(JN)
// ...
#undef OPCODE
```
over the style you're using here, as it makes the point of use much more readable, and allows reuse of the same set of macros rather than having a distinct one for each use.
================
Comment at: clang/lib/AST/ExprVM/Opcodes.inc:1
+//===--- Opcodes.inc - Opcode defitions for the constexpr VM ----*- C++ -*-===//
+//
----------------
rsmith wrote:
> This file is hardcoding too much of its `#include`rs' implementation details. Please find a way to improve that. Generally we prefer
>
> ```
> enum Opcode : uint32_t {
> #define OPCODE(Name) EO_ ## Name,
> #include "Opcodes.inc"
> };
> ```
>
> with this in `Opcodes.inc`:
>
> ```
> OPCODE(Jmp)
> OPCODE(JT)
> OPCODE(JN)
> // ...
> #undef OPCODE
> ```
>
> over the style you're using here, as it makes the point of use much more readable, and allows reuse of the same set of macros rather than having a distinct one for each use.
This file also needs a *lot* more documentation strings than it currently has. We need to document what all the opcodes are, what effects they have on the interpreter state, and so on.
================
Comment at: clang/lib/AST/ExprVM/Opcodes.inc:344-392
+OP_OFF(JMP)
+OP_OFF(JT)
+OP_OFF(JF)
+
+OP_GROUP_RET_ALL(RET)
+OP_RET_VOID(RET)
+OP_RET_VOID(NORET)
----------------
Opcode names should be formatted following our normal coding conventions: use `UpperCamelCase` here rather than `ALL_CAPS`.
================
Comment at: clang/lib/AST/ExprVM/Pointer.h:30
+
+/// Describes a memory block - generated once by the compiler.
+struct Descriptor {
----------------
How is this going to work for subobjects? We can't generate the full space of all possible descriptors, as that would be unmanageable for large arrays of complicated types.
================
Comment at: clang/lib/AST/ExprVM/Pointer.h:36-39
+ /// Size of the object.
+ unsigned Size;
+ /// Size of an element.
+ unsigned ElemSize;
----------------
Please use a distinct type for representing size-in-the-interpreter, to avoid confusion and bugs when dealing with code that must manage both compile-time sizes and runtime sizes. And please use `CharUnits` for representing sizes-at-runtime.
================
Comment at: clang/lib/AST/ExprVM/Pointer.h:40-49
+ /// Type of the elements.
+ PrimType ElemType;
+ /// Flag indicating if the descriptor is mutable.
+ bool IsMutable = false;
+ /// Flag indicating if the object is an array.
+ bool IsArray = false;
+ /// Flag indicating if the object is a global.
----------------
Consider packing these 5 members into 4 bytes.
================
Comment at: clang/lib/AST/ExprVM/Program.h:37
+
+using CodePtr = const char *;
+
----------------
Consider adding a type-safe wrapper for this.
================
Comment at: clang/lib/AST/ExprVM/Program.h:105
+ /// Program bytecode.
+ std::vector<char> Code;
+ /// Mapping from bytecode offsets to source locations.
----------------
If `CodePtr` is a plain pointer, using a single flat vector here seems dangerous; you'll have invalidation issues if you ever trigger creation of a `Function` while evaluating, which seems like something that will happen in practice, eg:
```
constexpr int f();
constexpr int g() { return f(); }
constexpr int f() { return 0; } // or imported from a module at this point
constexpr int k = g(); // evaluation of this call triggers generation of code for f
```
================
Comment at: clang/lib/AST/ExprVM/Program.h:106-107
+ std::vector<char> Code;
+ /// Mapping from bytecode offsets to source locations.
+ std::map<unsigned, SourceInfo> Source;
+};
----------------
Using a (presumably very large) `std::map` for this would presumably be very time- and space-inefficient. Do you expect this to be dense? (Would a flat array of `SourceInfo` work better?)
================
Comment at: clang/lib/AST/ExprVM/Type.h:24-34
+enum PrimType : unsigned {
+ ET_SINT_8,
+ ET_UINT_8,
+ ET_SINT_16,
+ ET_UINT_16,
+ ET_SINT_32,
+ ET_UINT_32,
----------------
As previously, please follow coding guidelines wrt naming here.
================
Comment at: clang/lib/AST/ExprVM/Type.h:51-53
+/// An integer which is as wide as the native pointer size to encode pointers.
+constexpr PrimType ET_NATIVE_PTR =
+ sizeof(size_t) == sizeof(PrimConv<ET_UINT_64>::T) ? ET_UINT_64 : ET_UINT_32;
----------------
What do you expect to use this for? If you need to encode a pointer, would it make sense to have a distinct primitive type for that?
================
Comment at: clang/lib/AST/ExprVM/Type.h:64-75
+#define TYPE_SWITCH(Expr, B) \
+ switch (Expr) { \
+ case ET_SINT_8: {using T = PrimConv<ET_SINT_8>::T; do{B;}while(0); break; }\
+ case ET_UINT_8: {using T = PrimConv<ET_UINT_8>::T; do{B;}while(0); break; }\
+ case ET_SINT_16: {using T = PrimConv<ET_SINT_16>::T; do{B;}while(0); break; }\
+ case ET_UINT_16: {using T = PrimConv<ET_UINT_16>::T; do{B;}while(0); break; }\
+ case ET_SINT_32: {using T = PrimConv<ET_SINT_32>::T; do{B;}while(0); break; }\
----------------
Injecting `T` into this seems too implicit to me. Consider instead accepting the name of a function-like macro and invoking it on the type. Eg:
```
#define RETURN_SIZEOF(T) return sizeof(T);
TYPE_SWITCH(Type, RETURN_SIZEOF);
```
Also please wrap the expansion in `do { ... } while (0)` to require the user of the macro to provide a semicolon afterwards.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D64146/new/
https://reviews.llvm.org/D64146
More information about the cfe-commits
mailing list