[cfe-dev] [RFC] ConstExprPreter — clang constexpr interpreter

Wed Jul 3 11:54:56 PDT 2019

Hi,

I'd just like to let you know that I am looking forward to trying out
the "ConstExprPreter".

I've had issues with the speed of constexpr evaluation in the past:
http://clang-developers.42468.n3.nabble.com/constexpr-JIT-td4058592.html
So your mail comes as a godsend :)

I cannot promise anything, but I'll try to come back with feedback "soon".

Regards,
Tim Rakowski

On Wed, Jul 3, 2019 at 7:49 PM Nandor Licker via cfe-dev
<cfe-dev at lists.llvm.org> wrote:
>
> TL;DR: Fast interpreter for C++ constexpr to replace the existing tree evaluator
>
> I am a PhD student at the University of Cambridge currently interning at Apple,
> working with JF Bastien and Michael Spencer on improving constexpr performance.
> Constexpr evaluation is presently slow and difficult since it relies on a
> monolithic AST walker. We propose replacing the tree evaluator with a bytecode
> interpreter to improve compilation times. The tree evaluator also poses
> significant maintenance and scalability problems, which we intend to ameliorate
> using the interpreter. While being generally faster, the interpreter does
> present two critical issues: a slightly increased memory footprint and added
> complexity to the compiler. This tradeoff is justified, as efficient constexpr
> evaluation could prove to be valuable as the language evolves.
>
> We would like to integrate this interpreter into clang. This RFC details the
> benefits of an interpreter and describes an initial implementation, along with
> a roadmap for replacing almost all of the tree evaluator with the interpreter.
> Even without optimizations, the performance of the interpreter matches that of
> the tree evaluator, thus the short-term focus is on feature completeness, not
> evaluation speed, as reflected by known inefficiencies in the current
> implementation. We would highly appreciate comments related to integration into
> clang and our roadmap towards replacing the evaluator, as well as feedback on
> the initial patch. This project serves mostly as a prototype in order to
> determine what kind of bytecode and compiler is required to efficiently evaluate
> code and emit useful diagnostic messages, paving the way for a future fast,
> potentially JIT-ed, interpreter.
>
> What?
>
> The ConstExprPreter is an interpreter for C++ constexpr embedded into the clang
> frontend: instead of evaluating constant expressions by walking the AST, the
> constexpr interpreter compiles C++ to safe bytecode and executes the bytecode
> in accordance with the constexpr semantics, emitting all appropriate
> diagnostics. It aims to replace the existing AST walker, which is less efficient
> and does not scale well in complexity as the constexpr subset of the C++
> language is expected to increase in the future.
>
> Why?
>
> The present constexpr evaluator is a 12.000 LOC monolith defined in
> clang/lib/AST/ExprConstant.cpp and poses a performance and maintenance problem.
> The tree interpreter limits the complexity of constexpr evaluated in a module by
> bounding recursion depth (-fconstexpr-depth=) and bounding the number of
> execution steps (-fconstexpr-steps). This severely limits the complexity of
> expression which can be evaluated in a given time budget. Furthermore, because
> of complexity, the implementation of certain potential future features on top
> of the evaluator, such as exception handling, pose serious difficulties. An
> efficient constexpr interpreter is expected to be faster and easily extensible,
> ameliorating some of the limitations of the tree evaluator, especially regarding
> performance. By improving the evaluation speed of constexpr, we expect to enable
> C++ users to replace instances of automatically generated code with complex
> constexpr, simplifying and improving the reliability of builds.
>
> Proposed Roadmap
>
> * Commit the initial patch which embeds a simple bytecode compiler and
>   interpreter into clang, alongside the existing constexpr evaluator. This
>   interpreter only supports a subset of constexpr and is disabled by default.
>
> * Add features to the interpreter, reaching a point where it supports all the
>   features of the existing evaluator.
>
> * Turn the interpreter on by default.
>
> * Move the entry point of the interpreter from function calls to arbitrary
>   toplevel expressions. In order to avoid performance regressions, a small
>   subset of features will be evaluated  without entering the bytecode compiler
>   and the VM: for example, frequent integer constant definitions, such as
>   constexpr int kMask = 1 << sizeof(T). This strategy requires keeping parts of
>   the existing Int evaluator, but allows the removal of the LValue, Complex,
>   etc. evaluators, significantly reducing the complexity of ExprConstant.cpp.
>
> * Remove most of the toplevel evaluator, minus the parts required to interpret
>   simple expressions. Roles will be reversed: if the evaluator encounters an
>   unsupported feature, it falls back to the interpreter.
>
> * Remove the flags enabling/forcing the interpreter.
>
> Initial Implementation
>
> The initial contribution is available in D64146. This only contains the
> minimal interpreter required to compile a basic example. Further patches have
> been developed, implementing pointers, which will be submitted for review
> afterwards.
>
> The implementation of the interpreter resides in lib/AST/ExprVM, in the
> clang::vm namespace. The sole entry point is from ExprConstant.cpp, in the
> HandleFunctionCall method. When this method is invoked in order to evaluate a
> function application, the vm::Context attached to each module is retrieved from
> ASTContext and an attempt is made to compile and interpret the function with
> the given parameters. If this fails, the tree interpreter is used as a fallback,
> unless the command line flags explicitly ask for a diagnostic pointing to
> unimplemented features. If the interpreter succeeds, the result is converted to
> a format compatible with the tree evaluator.
>
> The bytecode compiler simply walks the AST of each function generating a
> vm::Function, linking them together into a vm::Program. Certain peephole
> optimizations are performed in the compiler in order to optimize
> local/parameter accesses and remove redundant instructions, avoiding the use of
> pointers whenever possible. The compiler heavily relies on type information:
> the Classify method in vm::Context is used to approximate a type from the AST
> with a base type supported by the interpreter. The types and the necessary
> utilities are defined in Type.cpp. Presently, only 32-bit integers are
> supported, however 8, 16 and 64 bit integers will be added and a fallback onto
> APSInt is planned for 128-bits and beyond.
>
> Internally, the compiler relies on a type classification–PrimType–to decide what
> opcodes to emit for a particular operation. The Context::Classify method relies
> on target information to decide what internal types to map C/C++ types to,
> returning llvm::Optional<PrimType>. In the future, the classification is
> expected to be complete, removing a large number of the nullopt-checks from the
> compiler.
>
> The opcodes supported by the VM are defined in Opcodes.inc, a header used to
> generate most of the interpreter, as well as the disassembler. The VM is stack
> based, since such bytecode is fast to generate and the high upfront cost of
> register allocation is avoided. In order to accurately emit diagnostics, the VM
> needs to cooperate with the tree interpreter—this is achieved by isolating
> diagnostics into the vm::State class, inherited by both EvalInfo and
> InterpState. Stack frames now use vm::Frame as their base class, inherited by
> InterpFrame and CallStackFrame. This allows the stack frame to be traced through
> both the VM and the tree walker effectively. The present focus is on
> correctness — a path is kept open to optimize the interpreter and improve
> instruction dispatch and memory layout, however this is not the current
> priority. The interpreter could also be specialized into two variants: one that
> only detects problems, falling back to a slower version which tracks
> significantly more metadata for informative diagnostics.
>
> The interpreter needs to detect pointer accesses that would result in Undefined
> Behavior: dereferences and arithmetic on pointers which point to invalid memory
> locations. To achieve this, each allocation site has a unique descriptor,
> containing the metadata required to emit a diagnostic message. Each allocation
> creates a block tracking the descriptor, along with all live pointers to that
> block. Whenever a block goes out of scope, all the pointers are invalidated:
> instead of pointing to the block, they now point to the descriptor. If such a
> pointer is used, a diagnostic is generated and execution stops. This scheme
> adds an overhead to pointer arithmetic, as the intrusive linked list of
> pointers to a block needs to be maintained. If pointers correctly track the
> lifetime of stack objects, no additional cost is paid at deallocation sites as
> there are no pointers to invalidate. In the future, we might investigate
> different schemes, such as the one in asan (shadow memory + epoch counters), or
> garbage collection to keep invalid blocks alive as long as there are pointers
> to them.
>
> Usage
>
> The proposed patch adds two new flags to clang:
>
> -fexperimental-constexpr-interpreter:
>
>  Enables the interpreter, but falls back to the tree evaluator when encountering
>  language features which are not supported.
>
> -fexperimental-constexpr-force-interpreter:
>
>  Forces the use of the interpreter, generating an error diagnostic whenever an
>  unsupported feature is encountered.
>
> The behaviour of the compiler and the format of emitted diagnostics remains
> unchanged, unless a bug in clang’s diagnostics is identified. In such a case,
> we emit what we consider to be the correct diagnostic.
>
> Performance
>
> Since the present evaluator is not optimised for performance, a fair comparison
> is difficult to obtain, but we expect the interpreter to outperform the tree
> evaluator on most repetitive structures. Benchmarks on 1.000.000 iterations of
> the sum_pow function in test/ExprVM/loop.cpp show a ~10× improvement in the
> running time of the evaluator. Further tests are required in order to compare
> the performance of the interpreter to the tree evaluator when evaluating
> smaller, non-repetitive expressions. Once a reasonable subset of constexpr is
> implemented in the VM, performance benchmarks on open-source constexpr code
> will help us compare the cost of compilation and interpretation to that of
> direct evaluation. Since the interpreter is significantly faster, the currently
> tight limits on the number of statements and the number of steps can be relaxed.
>
> Memory Usage
>
> The constexpr interpreter requires a significant amount of metadata in order to
> detect cases of undefined behavior and correctly report diagnostics. The
> existing evaluator imposes a massive overhead, since all integers are
> represented as APInt and all pointers keep a significant amount of metadata.
> This overhead is lowered in the bytecode interpreter by better exploiting type
> information: integer constants are fixed width and require no additional
> information, while the size of pointer metadata is significantly reduced - 3x
> increase in pointer size and a 16-byte overhead per allocation (the interpreter
> tracks actual pointers for fast dereferencing, while the evaluator maintains an
> lvalue and a path into that lvalue for structure fields, incurring a massive
> overhead). Since the present implementation focuses on maintainability and
> feature completeness, this can be further reduced in the future.
>
> The compiled bytecode, quite dense due to the type-specialized opcodes, is
> maintained in memory as long as the AST is live, which currently happens to be
> live throughout all compilation phases. This adds to the total memory used by
> the compiler. In the future, mitigations might be required. Given that the
> ground truth—the AST—is present in memory, compiled functions could be
> discarded and recompiled on demand, reducing peak memory usage.
>
> Complexity
>
> The interpreter duplicates the functionality of the existing evaluator,
> presently adding significant complexity to the frontend. Unlike the monolithic
> ExprConstant.cpp implementation, the constexpr interpreter is significantly more
> modular, spreading the complexity across the compiler and the interpreter. It
> will be possible to test the compiler separately from the interpreter, allowing
> for easier maintenance. We expect the frontend to become simpler and more
> maintainable after the AST walker is removed.
>
> The full implementation of the interpreter is expected to involve significant
> engineering effort. While development is in progress, an implementation of
> reduction rules is required in both the tree walker and interpreter, adding
> redundancy.
>
> Extensibility
>
> The interpreter should evolve alongside the language in the future, allowing for
> new features included in future standards to be supported. We refrain from
> performing any optimizations that would hinder the implementation of additional
> features, such as constexpr support for alloca and exception handling.
>
> Thanks for reading!
>
> Nandor Licker
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev