[PATCH] D64146: [Clang Interpreter] Initial patch for the constexpr interpreter

Tue Feb 16 15:20:44 PST 2021

jrtc27 added a comment.
Herald added a subscriber: dang.

So I'm running into issues with this patch, specifically the line annotated below. I'm trying to compile our CHERI-LLVM fork as a native pure-capability CHERI[1] binary (which eventually will mean compiling the Morello-LLVM fork of our CHERI-LLVM to run as a native pure-capability CHERI binary on Arm's prototype Morello[2] board when it ships late this year/early next year), and I have LLVM (other than Orc JIT which is a whole beast and can't work without in-tree backend support) and LLD working (albeit only very lightly tested), and if I stub out that single function I am able to get Clang itself compiling and working (at least, well enough to compile a hello world C program all the way down to an executable), so this is the only technical blocker that I know which is rather frustrating.

CHERI capabilities can be used to implement C language pointers, providing strict fine-grained spatial safety and pointer provenance guarantees (i.e. it's not just that pointers have bounds, you also can't go and synthesise a pointer with the bounds you want if you don't already have one for that region), plus heap temporal safety on an experimental branch. This is done using tagged memory, where every capability-sized word in memory has a single tag bit indicating whether it has a valid capability or not. This means that, although you can memcpy capabilities around, they must remain aligned, otherwise their tags will become cleared and no longer be dereferenceable.

So, for us, that line _has_ to be able to be written as `auto Punned = *(uintptr_t *)Ptr;` (although it can be written in equivalent ways), as byte swapping uintptr_t is not something we can ever support (though with endianness::native you could at least make the templated version specialised to not call swapByteOrder, though the one taking a function argument can't be, but why bother going through all that indirection), nor is storing pointers to unaligned addresses if you want to be able to use them as pointers later.

I know nothing about this recently(ish)-added Clang constexpr interpreter, but it seems like this should be a solvable problem with some changes to the design to be less lax about alignment. Is there any documentation you could provide on how `CodePtr` is meant to work? "Pointer into the code segment." isn't exactly helpful, especially given it's actually storing pointers, which is not what you normally find in code segments when talking instead about compiled code. Do you have ideas for how this problem can be avoided?

[1] http://cheri-cpu.org/
[2] https://developer.arm.com/architectures/cpu-architecture/a-profile/morello

================
Comment at: cfe/trunk/lib/AST/Interp/Source.h:69
+    using namespace llvm::support;
+    auto Punned = endian::read<uintptr_t, endianness::native, 1>(Ptr);
+    return reinterpret_cast<T>(Punned);
----------------
^^^ this line ^^^

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64146/new/

https://reviews.llvm.org/D64146