[PATCH] D103492: [RS4GC] Treat inttoptr as base pointer

Sam Dirkswager via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Jun 4 14:11:09 PDT 2021


tolziplohu added a comment.

In D103492#2796673 <https://reviews.llvm.org/D103492#2796673>, @reames wrote:

> I think you've got a deeper issue here. RS4GC is intended to lower from the abstract machine model to the physical machine model. In the abstract machine model, references (i.e. gc pointers) *must* be non-integral. You don't have to use address space 1 (there are code hooks to change that), but abstract machine model requires non-integral pointers for the address space being rewritten.

You're right, I'm wrong. The statepoint semantics require non-integral pointers no matter what. I think RS4GC should make sure that the address space it uses has been specified non-integral and give an error if it hasn't, as this seems like an easy trap to fall into.

> This is a really really hard design problem. I'm happy to discuss this offline, and I'm open to having a rule for handling inttoptr with explicit calls outs on the UB, but I'm worried you are misunderstanding the problem and are going to find yourself down a much deeper rabbit hole than you expect.

It is possible I'm misunderstanding, so here's how I see the problem: converting a GC pointer to an integer and then to a pointer again is UB and bad, because the collector can't see it in between. However, many languages require pretending an integer is a pointer. In these cases, it's not actually a valid pointer, and it's not a problem if the GC can't see it; the problem is, it's hard to make that work without accidentally making it possible to hide a real pointer as an integer. I see three main options for dealing with this:

1. Do nothing, leave it as-is. In this case, the way to do something like this is to hide the `inttoptr` along with an `addrspacecast` to the non-integral address space in a `noinline` function without GC. This works, but is bad for optimization and performance in general, since an `inttoptr` should ideally be a no-op.
2. Allow `inttoptr` (and `addrspacecast`, which is currently allowed in release builds but not debug builds) in RS4GC, with a note that accessing memory through a pointer created by `inttoptr` is invalid (a linter to catch simple cases of that might be a good idea?). This is ideal for optimization and performance, since it lowers to a no-op and many optimization passes can see through it, but it's easier for frontend authors to miss and create invalid casts.
3. Add conversion intrinsics, something like `declare i8 addrspace(1)* @llvm.gc.inttoptr(i64)`. This is harder for frontend authors to screw up, since to find the intrinsic they'll be looking at the docs which will specify that the returned pointer isn't a valid pointer. From a performance perspective, it's in between the other two options, because while it lowers to a no-op, it's opaque to optimizations.

The third option sounds most attractive to me, now that I better understand the risks of the second option. Does my evaluation of the problem sound accurate?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103492/new/

https://reviews.llvm.org/D103492



More information about the llvm-commits mailing list