[cfe-dev] [RFC] ConstExprPreter — clang constexpr interpreter

Wed Jul 3 15:19:12 PDT 2019

Hello Richard,

>  * The current APValue representation is extremely bloated. Each instance is 72 bytes, and every subobject of an object is stored as a distinct APValue, so for instance a single char[128] variable will often occupy 9288 bytes of storage and incurs 128 distinct memory allocations.

Arrays and structures will be stored in a compact, contiguous form in memory, so we can save a lot of space here.

>  * The current representation of an lvalue as an explicit subobject path is likewise very expensive in terms of memory and distinct allocations.

LValues are simply pointers in the VM. Pointers into structures will be a combination of a block and a descriptor: the
block represents the allocated memory, while the descriptor characterises the current field and limits what is accessible
through the pointer. There should be a unique descriptor for each field, so this option is more compact.

> I have some pretty concrete ideas for how to solve these problems that I could write up if you're interested in tackling them.

Would be great - we do have some ideas on how to proceed, but we'd appreciate any other input. One thing to mention is that
we first want a working and simple implementation which leaves the door open for further optimisations. We would like to avoid
optimisations now if they come with significant complexity.

> (Your project name makes me cringe a little; maybe referring to it as ExprVM to match the subdirectory name would be better?)

I'll let JF Bastien argue this point.

> Minor point: while the depth limit does exist primarily to work around problems caused by the recursive implementation, the steps limit is provided to catch infinite loops. (I assume your point is that we could raise this limit commensurately with any performance improvements in the evaluator; that seems reasonable.)

The numbers are a ballpark measure either way - we could allow the VM to run 10x the number of steps if we expect it to be faster.
But this is something to be implemented in the future. Same goes for recursion depth, although we could experiment with tail call
elimination as well.

> I notice that this is effectively building another system to convert the AST into a control flow graph. Have you considered using the existing CFG layer for (the majority of) this?

The bytecode is just a linear sequence of instructions - we do not build a CFG or explicit basic blocks at any point.
I guess that would be required if we were to apply non-trivial optimisations, but that is not planned. We perform
some peephole optimisations and emit specialised opcodes, but we do not plan to go beyond that as we are not sure
the optimisations would pay for themselves. Given that diagnostics are required whenever UB is encountered,
the opportunities are severely limited.

> There are a bunch of other ways that pointers might be invalidated, due to lifetime events rather than storage events. For example, changing the active member of a union would prevent access through old pointers to the old active member, and changing the active union member /back/ would make such accesses valid again. Similarly, an explicit destructor call on an object prevents that object from being used, but certain other operations can bring it back to life and make the old pointers / references usable again. How do you intend to handle such cases?

The present implementation requires all allocated blocks to track all pointers which point to them. This
mechanism should allow unions and invalidation to be implemented. This is obviously not the fastest
option and we definitely need to think more about the invalidation aspect. Up until now, we mostly 
considered the lifetime problem and a GC-like solution to it, but invalidation further complicates that.

Thank you,
Nandor Licker
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190703/f9971be7/attachment.html>