[llvm-dev] [cfe-dev] RFC: Implementing -fno-delete-null-pointer-checks in clang

Tue May 1 02:14:19 PDT 2018

On 30 Apr 2018, at 21:26, David Zarzycki via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
> I might be misunderstanding the thread here, but are there architectures other than Intel that support alternative address spaces? I’m asking because x86_64 dropped support for having the code, data, stack, and “extra” segments be different from each other; and the only two remaining segment registers, “FS” and “GS”, are only used in practice to alias the current address space. In fact, *user-space* instructions were later added to read/write the FS/GS segment bases, thus embracing the fact that these segment registers are used in practice to alias the current address space.[1]

I’m not 100% sure if you’re asking whether processors support different address space, or whether you’re asking whether targets use LLVM’s notion of an address space, so I’ll try to answer both.

To the first interpretation of your question:

Others have pointed out that GPUs have different memory regions (shared mutable, shared immutable, local, and so on).  Any processor with an MMU supports some notion of address spaces, the simplest of which involves multiple completely distinct address spaces.  This is somewhat complicated by shared memory.  In the C abstract machine, there is no difference between pointers to shared and unshared memory, which is unfortunate as the safety of storing pointers in such regions can vary.  In OpenCL, the host can map regions into which it is safe to store pointers that are valid on both the host and device, which a more sane language than C would regard as a separate address space.

In terms of out-of-tree architectures, a large number of embedded processors have different regions for (for example), stack, code ROM, data ROM, and heap.  Some have different overlapping shared regions.  The architecture that I’ve worked on for the last 6 years, CHERI, provides a flexible notion of address spaces allowing a model like segmentation at the coarse granularity for sandboxing legacy code (with 64-bit integers as pointers), or fine-grained memory safety by representing every pointer as a 128-bit hardware-enforced type that encodes bounds and permissions.

To the second interpretation of your question:

GPUs use different address spaces for their different memory types, as do out-of-tree embedded targets.  Azul and the (apparently now dead) CLR back end using LLVM used AS1 to indicate that a pointer was to GC’d memory.  We use AS200 to indicate a 128-bit fat pointer and AS0 to indicate a 64-bit pointer (which is implicitly relative to a default 128-bit pointer in a special register).

It’s worth noting that LLVM’s notion of an address space is a property of the pointer, whereas embedded C regards it as a property of the underlying memory.  This means that it is always syntactically valid to cast between address spaces in LLVM IR, though the result may be a non-dereferencable pointer.  This is somewhat problematic for optimisers, because this information is not well expressed (for us, for example, casting from AS0 to AS200 always results in a pointer that is valid if the original is valid, but casting from AS200 to AS0 may be null.  We’ve had to do a lot of cleanup on optimisers to prevent them from generating broken code as a result).

The current model of an AS conflates two notions: a different region of memory (potentially with different properties) and a different kind of pointer (potentially with different properties).  It would be nice to decouple these and provide a mechanism similar to function attributes that would allow properties on pointers to be expressed, in an orthogonal manner to address spaces.  This would require moving some information (such as pointer size) into the attributes, but would probably be a long-term cleaner approach.

This would probably be easier after the typeless pointer work is completed, so that pointers are all of type PTR but with attributes indicating their other properties.  The AMD GPU, for example, could benefit from having an attribute indicating that -1, rather than 0, is the ‘invalid pointer’ value for some kinds.  Other useful information would include aliasing scopes (which could be updated on inlining), values that are guaranteed not to be dereferenced, whether out-of-bounds values are representable, and so on.

David