[llvm-dev] Demystifying the byte type

Tue Oct 19 02:56:07 PDT 2021

Hi George,

On 15/10/2021 19:41, George Mitenkov via llvm-dev wrote:
 >
 > Hi all,
 >
 > In May 2021, together with Nuno Lopes and Juneyoung Lee, we proposed 
to add a byte type in LLVM to fix load type punning issues. Initial RFC 
touched some subtle aspects of LLVM IR and its semantics, and sparked a 
lot of questions, concerns, and discussions.
 >
 > We decided to write a post that would summarise the thread and the 
complicated topic:
 >
 > 
https://gist.github.com/georgemitenkov/3def898b8845c2cc161bd216cbbdb81f 
<https://gist.github.com/georgemitenkov/3def898b8845c2cc161bd216cbbdb81f>
 >
 > We hope that our post clarifies initial concerns raised on the 
mailing list. As always, any questions, suggestions and advice are welcome!

Thank you for the writeup.  I think a big part of the problem in 
understanding this comes from the name of the type.  On 
provenance-carrying architectures (such as CHERI systems, including 
Arm's Morello[1]), it is unsound to copy a pointer as bytes.  Pointers 
must be copied by provenance-carrying operations.  The hardware splits 
registers into ones that don't carry provenance (integer, 
floating-point, vector) and ones that do but which can *also* be used to 
copy non-pointer data (capabilities).

On a CHERI system, ptrtoint does not confer provenance and inttoptr on 
the result may yield either an invalid pointer or a pointer with larger 
  bounds, depending on the environment.  This reflects the machine 
semantics: converting a pointer to an integer is an operation that 
simply extracts the address (on Morello, the address is exposed as a 
subregister of the capability register).  Converting in the opposite 
direction inserts the address into the capability held in the default 
data capability register (which, in the pure-capability ABI is typically 
not a valid capabilitiy and so yields an invalid pointer, in the hybrid 
ABI refers to the part of the address space used for legacy code).

I think that all of this is fairly aligned with your byte type.

David

[1] 
https://developer.arm.com/architectures/cpu-architecture/a-profile/morello