[LLVMdev] n-bit bytes for clang/llvm

Wed Mar 18 00:31:36 PDT 2015

On 17 Mar 2015, at 13:11, Tyro Software <softwaretyro at gmail.com> wrote:
> 
> As an alternative to fixing the "char == 8 bits" presumption would using non-uniform pointer types have been another possible approach, e.g. keep char as 8 bit but have char* encode both the word address and the byte location within it (i.e. one extra bit in this 16-bit case). Of course this is only a less intrusive (to LLVM) approach if LLVM readily supports such pointers, which may be close to asking "could 8086 small/large/huge pointers be implemented?" 
> 
> One obvious drawback to such an approach is that dereferencing char* becomes relatively expensive, though for the sort of code being predominantly run on a DSP that might be acceptable. 

We're using multiple address spaces to describe two pointer representations for CHERI: AS0 is a 64-bit pointer that's represented as an integer, AS200 is a capability (256-bit fat pointer with base, length, permissions, enforced in hardware).  We had to fix a few things where LLVM assumes that pointers are integers, but the different size pointers in different address spaces part works very well.  The biggest weakness is in TableGen / SelectionDAG, where you can't write patterns on iPTR that depend on a specific AS (actually, you can't really write patterns on iPTR at all, as LLVM tries to lower iPTR to some integer type first, even when this doesn't make any sense [e.g. on an architecture with separate address and integer registers]).

Having AS0 be a byte pointer, which the back end would lower to two words, and some target-specific AS be a word pointer would likely work quite well.

David