[cfe-dev] [RFC] ConstExprPreter — clang constexpr interpreter

Wed Jul 3 12:14:59 PDT 2019

Hi Nandor,

This sounds like a really cool project, and something that we would
significantly benefit from.

This is an area I've put some thought into in the past. A lot of your
decisions below seem good to me; I think there are a few other things you
should consider:

 * The current APValue representation is extremely bloated. Each instance
is 72 bytes, and every subobject of an object is stored as a distinct
APValue, so for instance a single char[128] variable will often occupy 9288
bytes of storage and incurs 128 distinct memory allocations.
 * The current representation of an lvalue as an explicit subobject path is
likewise very expensive in terms of memory and distinct allocations.

I have some pretty concrete ideas for how to solve these problems that I
could write up if you're interested in tackling them.

(Your project name makes me cringe a little; maybe referring to it as
ExprVM to match the subdirectory name would be better?)

On Wed, 3 Jul 2019 at 10:49, Nandor Licker via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> TL;DR: Fast interpreter for C++ constexpr to replace the existing tree
> evaluator
>
> I am a PhD student at the University of Cambridge currently interning at
> Apple,
> working with JF Bastien and Michael Spencer on improving constexpr
> performance.
> Constexpr evaluation is presently slow and difficult since it relies on a
> monolithic AST walker. We propose replacing the tree evaluator with a
> bytecode
> interpreter to improve compilation times. The tree evaluator also poses
> significant maintenance and scalability problems, which we intend to
> ameliorate
> using the interpreter. While being generally faster, the interpreter does
> present two critical issues: a slightly increased memory footprint and
> added
> complexity to the compiler. This tradeoff is justified, as efficient
> constexpr
> evaluation could prove to be valuable as the language evolves.
>
> We would like to integrate this interpreter into clang. This RFC details
> the
> benefits of an interpreter and describes an initial implementation, along
> with
> a roadmap for replacing almost all of the tree evaluator with the
> interpreter.
> Even without optimizations, the performance of the interpreter matches
> that of
> the tree evaluator, thus the short-term focus is on feature completeness,
> not
> evaluation speed, as reflected by known inefficiencies in the current
> implementation. We would highly appreciate comments related to integration
> into
> clang and our roadmap towards replacing the evaluator, as well as feedback
> on
> the initial patch. This project serves mostly as a prototype in order to
> determine what kind of bytecode and compiler is required to efficiently
> evaluate
> code and emit useful diagnostic messages, paving the way for a future
> fast,
> potentially JIT-ed, interpreter.
>
> What?
>
> The ConstExprPreter is an interpreter for C++ constexpr embedded into the
> clang
> frontend: instead of evaluating constant expressions by walking the AST,
> the
> constexpr interpreter compiles C++ to safe bytecode and executes the
> bytecode
> in accordance with the constexpr semantics, emitting all appropriate
> diagnostics. It aims to replace the existing AST walker, which is less
> efficient
> and does not scale well in complexity as the constexpr subset of the C++
> language is expected to increase in the future.
>
> Why?
>
> The present constexpr evaluator is a 12.000 LOC monolith defined in
> clang/lib/AST/ExprConstant.cpp and poses a performance and maintenance
> problem.
> The tree interpreter limits the complexity of constexpr evaluated in a
> module by
> bounding recursion depth (-fconstexpr-depth=) and bounding the number of
> execution steps (-fconstexpr-steps).

Minor point: while the depth limit does exist primarily to work around
problems caused by the recursive implementation, the steps limit is
provided to catch infinite loops. (I assume your point is that we could
raise this limit commensurately with any performance improvements in the
evaluator; that seems reasonable.)

This severely limits the complexity of
> expression which can be evaluated in a given time budget. Furthermore,
> because
> of complexity, the implementation of certain potential future features on
> top
> of the evaluator, such as exception handling, pose serious difficulties.
> An
> efficient constexpr interpreter is expected to be faster and easily
> extensible,
> ameliorating some of the limitations of the tree evaluator, especially
> regarding
> performance. By improving the evaluation speed of constexpr, we expect to
> enable
> C++ users to replace instances of automatically generated code with
> complex
> constexpr, simplifying and improving the reliability of builds.
>
> Proposed Roadmap
>
> * Commit the initial patch which embeds a simple bytecode compiler and
>   interpreter into clang, alongside the existing constexpr evaluator. This
>   interpreter only supports a subset of constexpr and is disabled by
> default.
>

I notice that this is effectively building another system to convert the
AST into a control flow graph. Have you considered using the existing CFG
layer for (the majority of) this?

> * Add features to the interpreter, reaching a point where it supports all
> the
>   features of the existing evaluator.
>
> * Turn the interpreter on by default.
>
> * Move the entry point of the interpreter from function calls to arbitrary
>   toplevel expressions. In order to avoid performance regressions, a small
>   subset of features will be evaluated  without entering the bytecode
> compiler
>   and the VM: for example, frequent integer constant definitions, such as
>   constexpr int kMask = 1 << sizeof(T). This strategy requires keeping
> parts of
>   the existing Int evaluator, but allows the removal of the LValue,
> Complex,
>   etc. evaluators, significantly reducing the complexity of
> ExprConstant.cpp.
>
> * Remove most of the toplevel evaluator, minus the parts required to
> interpret
>   simple expressions. Roles will be reversed: if the evaluator encounters
> an
>   unsupported feature, it falls back to the interpreter.
>

I like this approach a lot.

> * Remove the flags enabling/forcing the interpreter.
>
> Initial Implementation
>
> The initial contribution is available in D64146. This only contains the
> minimal interpreter required to compile a basic example. Further patches
> have
> been developed, implementing pointers, which will be submitted for review
> afterwards.
>
> The implementation of the interpreter resides in lib/AST/ExprVM, in the
> clang::vm namespace. The sole entry point is from ExprConstant.cpp, in the
> HandleFunctionCall method. When this method is invoked in order to
> evaluate a
> function application, the vm::Context attached to each module is retrieved
> from
> ASTContext and an attempt is made to compile and interpret the function
> with
> the given parameters. If this fails, the tree interpreter is used as a
> fallback,
> unless the command line flags explicitly ask for a diagnostic pointing to
> unimplemented features. If the interpreter succeeds, the result is
> converted to
> a format compatible with the tree evaluator.
>
> The bytecode compiler simply walks the AST of each function generating a
> vm::Function, linking them together into a vm::Program. Certain peephole
> optimizations are performed in the compiler in order to optimize
> local/parameter accesses and remove redundant instructions, avoiding the
> use of
> pointers whenever possible. The compiler heavily relies on type
> information:
> the Classify method in vm::Context is used to approximate a type from the
> AST
> with a base type supported by the interpreter. The types and the necessary
> utilities are defined in Type.cpp. Presently, only 32-bit integers are
> supported, however 8, 16 and 64 bit integers will be added and a fallback
> onto
> APSInt is planned for 128-bits and beyond.
>
> Internally, the compiler relies on a type classification–PrimType–to
> decide what
> opcodes to emit for a particular operation. The Context::Classify method
> relies
> on target information to decide what internal types to map C/C++ types to,
> returning llvm::Optional<PrimType>. In the future, the classification is
> expected to be complete, removing a large number of the nullopt-checks
> from the
> compiler.
>
> The opcodes supported by the VM are defined in Opcodes.inc, a header used
> to
> generate most of the interpreter, as well as the disassembler. The VM is
> stack
> based, since such bytecode is fast to generate and the high upfront cost
> of
> register allocation is avoided. In order to accurately emit diagnostics,
> the VM
> needs to cooperate with the tree interpreter—this is achieved by isolating
> diagnostics into the vm::State class, inherited by both EvalInfo and
> InterpState. Stack frames now use vm::Frame as their base class, inherited
> by
> InterpFrame and CallStackFrame. This allows the stack frame to be traced
> through
> both the VM and the tree walker effectively. The present focus is on
> correctness — a path is kept open to optimize the interpreter and improve
> instruction dispatch and memory layout, however this is not the current
> priority. The interpreter could also be specialized into two variants: one
> that
> only detects problems, falling back to a slower version which tracks
> significantly more metadata for informative diagnostics.
>
> The interpreter needs to detect pointer accesses that would result in
> Undefined
> Behavior: dereferences and arithmetic on pointers which point to invalid
> memory
> locations. To achieve this, each allocation site has a unique descriptor,
> containing the metadata required to emit a diagnostic message. Each
> allocation
> creates a block tracking the descriptor, along with all live pointers to
> that
> block. Whenever a block goes out of scope, all the pointers are
> invalidated:
> instead of pointing to the block, they now point to the descriptor. If
> such a
> pointer is used, a diagnostic is generated and execution stops. This
> scheme
> adds an overhead to pointer arithmetic, as the intrusive linked list of
> pointers to a block needs to be maintained. If pointers correctly track
> the
> lifetime of stack objects, no additional cost is paid at deallocation
> sites as
> there are no pointers to invalidate.

There are a bunch of other ways that pointers might be invalidated, due to
lifetime events rather than storage events. For example, changing the
active member of a union would prevent access through old pointers to the
old active member, and changing the active union member /back/ would make
such accesses valid again. Similarly, an explicit destructor call on an
object prevents that object from being used, but certain other operations
can bring it back to life and make the old pointers / references usable
again. How do you intend to handle such cases?

> In the future, we might investigate
> different schemes, such as the one in asan (shadow memory + epoch
> counters), or
> garbage collection to keep invalid blocks alive as long as there are
> pointers
> to them.
>
> Usage
>
> The proposed patch adds two new flags to clang:
>
> -fexperimental-constexpr-interpreter:
>
>  Enables the interpreter, but falls back to the tree evaluator when
> encountering
>  language features which are not supported.
>
> -fexperimental-constexpr-force-interpreter:
>
>  Forces the use of the interpreter, generating an error diagnostic
> whenever an
>  unsupported feature is encountered.
>
> The behaviour of the compiler and the format of emitted diagnostics
> remains
> unchanged, unless a bug in clang’s diagnostics is identified. In such a
> case,
> we emit what we consider to be the correct diagnostic.
>
> Performance
>
> Since the present evaluator is not optimised for performance, a fair
> comparison
> is difficult to obtain, but we expect the interpreter to outperform the
> tree
> evaluator on most repetitive structures. Benchmarks on 1.000.000
> iterations of
> the sum_pow function in test/ExprVM/loop.cpp show a ~10× improvement in
> the
> running time of the evaluator. Further tests are required in order to
> compare
> the performance of the interpreter to the tree evaluator when evaluating
> smaller, non-repetitive expressions. Once a reasonable subset of constexpr
> is
> implemented in the VM, performance benchmarks on open-source constexpr
> code
> will help us compare the cost of compilation and interpretation to that of
> direct evaluation. Since the interpreter is significantly faster, the
> currently
> tight limits on the number of statements and the number of steps can be
> relaxed.
>
> Memory Usage
>
> The constexpr interpreter requires a significant amount of metadata in
> order to
> detect cases of undefined behavior and correctly report diagnostics. The
> existing evaluator imposes a massive overhead, since all integers are
> represented as APInt and all pointers keep a significant amount of
> metadata.
> This overhead is lowered in the bytecode interpreter by better exploiting
> type
> information: integer constants are fixed width and require no additional
> information, while the size of pointer metadata is significantly reduced -
> 3x
> increase in pointer size and a 16-byte overhead per allocation (the
> interpreter
> tracks actual pointers for fast dereferencing, while the evaluator
> maintains an
> lvalue and a path into that lvalue for structure fields, incurring a
> massive
> overhead). Since the present implementation focuses on maintainability and
> feature completeness, this can be further reduced in the future.
>
> The compiled bytecode, quite dense due to the type-specialized opcodes, is
> maintained in memory as long as the AST is live, which currently happens
> to be
> live throughout all compilation phases. This adds to the total memory used
> by
> the compiler. In the future, mitigations might be required. Given that the
> ground truth—the AST—is present in memory, compiled functions could be
> discarded and recompiled on demand, reducing peak memory usage.
>
> Complexity
>
> The interpreter duplicates the functionality of the existing evaluator,
> presently adding significant complexity to the frontend. Unlike the
> monolithic
> ExprConstant.cpp implementation, the constexpr interpreter is
> significantly more
> modular, spreading the complexity across the compiler and the interpreter.
> It
> will be possible to test the compiler separately from the interpreter,
> allowing
> for easier maintenance. We expect the frontend to become simpler and more
> maintainable after the AST walker is removed.
>
> The full implementation of the interpreter is expected to involve
> significant
> engineering effort. While development is in progress, an implementation of
> reduction rules is required in both the tree walker and interpreter,
> adding
> redundancy.
>
> Extensibility
>
> The interpreter should evolve alongside the language in the future,
> allowing for
> new features included in future standards to be supported. We refrain from
> performing any optimizations that would hinder the implementation of
> additional
> features, such as constexpr support for alloca and exception handling.
>
> Thanks for reading!
>
> Nandor Licker
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190703/6cbcf9ae/attachment.html>