[llvm-dev] Demystifying the byte type
David Chisnall via llvm-dev
llvm-dev at lists.llvm.org
Tue Oct 19 02:56:07 PDT 2021
On 15/10/2021 19:41, George Mitenkov via llvm-dev wrote:
> Hi all,
> In May 2021, together with Nuno Lopes and Juneyoung Lee, we proposed
to add a byte type in LLVM to fix load type punning issues. Initial RFC
touched some subtle aspects of LLVM IR and its semantics, and sparked a
lot of questions, concerns, and discussions.
> We decided to write a post that would summarise the thread and the
> We hope that our post clarifies initial concerns raised on the
mailing list. As always, any questions, suggestions and advice are welcome!
Thank you for the writeup. I think a big part of the problem in
understanding this comes from the name of the type. On
provenance-carrying architectures (such as CHERI systems, including
Arm's Morello), it is unsound to copy a pointer as bytes. Pointers
must be copied by provenance-carrying operations. The hardware splits
registers into ones that don't carry provenance (integer,
floating-point, vector) and ones that do but which can *also* be used to
copy non-pointer data (capabilities).
On a CHERI system, ptrtoint does not confer provenance and inttoptr on
the result may yield either an invalid pointer or a pointer with larger
bounds, depending on the environment. This reflects the machine
semantics: converting a pointer to an integer is an operation that
simply extracts the address (on Morello, the address is exposed as a
subregister of the capability register). Converting in the opposite
direction inserts the address into the capability held in the default
data capability register (which, in the pure-capability ABI is typically
not a valid capabilitiy and so yields an invalid pointer, in the hybrid
ABI refers to the part of the address space used for legacy code).
I think that all of this is fairly aligned with your byte type.
More information about the llvm-dev