[llvm-dev] RFC: alloca -- specify address space for allocation

Tue Sep 8 08:05:37 PDT 2015

Thanks, having that context is very helpful.

I actually think our use case is somewhat similar in spirit, as one of the key points of our system is treating object identity as unforgeable/unforged (guaranteed by type rules in verified safe code, assumed but unverified in trusted unsafe code).

So the main question left for me is how much of a hinderance the pervasive addrspacecasts we may have will be for optimization.  Things like:
 - Can two addrspace casts be reordered past each other?
 - Can an addrspace cast be reordered across memory dereferences?
 - Can two addrspacecasts with the same input value be CSE'd?
 - Is an address presumed to be escaped when it is addrspacecasted?
 - If we load from a pointer `%p` and from a pointer `%q` which is an addrspacecast of `%p`, will those loads be seen as redundant?
   - Can a store to `%p` feeding a load from `%q`, or vice-versa, be replaced with an SSA value?
 - Can an addrspacecast be hoisted to where it will be speculatively executed?

Popping up a level: when I have these sorts of questions about an opcode, and I don't see them spelled out in its entry in the LangRef, where should I look?  Is there some central place describing such properties, or would I just need to read/test the relevant optimizations?

Thanks
-Joseph

-----Original Message-----
From: Dr D. Chisnall [mailto:dc552 at hermes.cam.ac.uk] On Behalf Of David Chisnall
Sent: Monday, September 7, 2015 6:27 AM
To: Joseph Tremoulet <jotrem at microsoft.com>
Cc: Marcello Maggioni <mmaggioni at apple.com>; llvm-dev <llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] RFC: alloca -- specify address space for allocation

On 2 Sep 2015, at 02:54, Joseph Tremoulet via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
> Reading further, I see both that addrspacecast "can be a no-op cast or a complex value modification"[2] and that bitcast "may only be [used on pointers] with the same address space"[4].
> 
> So I'm getting the impression that it's ok to have a model with semantically meaningful aliasing between address spaces, but also that anywhere we want to reference a local's address with an addrspace(1) pointer (which is everywhere our source language takes its address), as things stand now we need either to use an addrspace cast which will be assumed to possibly have side-effects, or to round-trip through ptrtoint/inttoptr which I presume will obscure the aliasing information.  It certainly gives us a correct place to start from, but (unless I'm misunderstanding and the "complex value modification" type of addrspacecast isn't assumed to have side-effects) I wouldn't be surprised if we come back to this wanting a way to represent a cast across address spaces that's as transparent as a bitcast.

To give a bit of background on that:

The use case for introducing AS casts as distinct from bitcasts (and not going via inttoptr / ptrtoint) is architectures that have different pointer representations.  For example, some microcontrollers have a 16-bit PC and 32-bit address registers, allowing code pointers to be smaller than data pointers.  Some GPUs (used to?) use different sized pointers for the various different places in the memory hierarchy.  In our architecture, this is even more complicated, because we support two different pointer representations:

- 256-bit (or 128-bit, on newer revisions) memory capabilities, that both identify and grant access to a region of memory and have unforgeability guaranteed by the hardware.  In LLVM, we represent these as pointers with AS 200.

- 64-bit legacy-compabible pointers that are implicitly relative to a global capability (and so are only dereferenceable within a restricted range of the process’ virtual address space).  In LLVM, we represent these as pointers with AS 0.

For us, an AS cast between AS 0 and AS 200 will succeed if and only if the address is within the current range of the global capability.  Any address in AS 0 may alias any address in AS 200 (except in some trivial cases, it’s impossible to determine statically that they don’t), but one value is an integer interpreted as an address, whereas the other is a fat pointer with bounds and permissions enforced in hardware.

David