[PATCH] D63525: LangRef: Attempt to formulate some rules for addrspacecast

David Chisnall via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Jun 20 02:03:44 PDT 2019


theraven added a comment.

Apparently I forgot to hit submit on the comments I made yesterday, sorry!



================
Comment at: docs/LangRef.rst:9694
+space conversion is legal then both result and operand refer to the
+same memory location. The conversion must have no side effects, or
+capture the value of the pointer.
----------------
arsenm wrote:
> theraven wrote:
> > What does 'is legal' actually mean?  For example, if I have a 32-bit VA space and a 64-bit VA space in the same process with a subset of the program understanding only one and a shim layer understanding both, then the cast from 32-bit to 64-bit will always succeed but the converse cast is always a statically valid operation but may dynamically fail (and the failure case may be to return null).
> I was thinking legal means a dereferencable result. If you were to access the result when the cast failed, it would be undefined
I'm happy with that clarification.  We probably want to explicitly talk about dynamic failure for addrspacecast somewhere in this documentation.


================
Comment at: docs/LangRef.rst:9695
+same memory location. The conversion must have no side effects, or
+capture the value of the pointer.
+
----------------
arsenm wrote:
> theraven wrote:
> > Does this not preclude using address space casts to make pointers visible to GC (or, conversely, to notify the GC that a pointer has escaped)?  I think we may need to define what 'capture' means in this context.  I believe the goal of this is to state that the compiler is free to elide address space casts with no uses?
> I'm trying to reiterate that this is not an operation that touches memory in any way. We do allow addrspacecasts in constant expressions, and they can't trap. Elision should be OK.
> 
> The intent is addrspacecast has stronger aliasing properties than a ptrtoint/inttoptr pair, which was part of the original intent of adding a new instruction.
Makes sense, we  probably want to clarify what capturing means from the perspective of allowed transforms here.  It may be that an addrsoacecast has static side effects (if that's a valid term) causing some extra metadata to be emitted about roots on the stack or in registers, for example, but the compiler is free to move address space casts across other operations and to elide them.

In the CHERI case, we do have one operation that it is not safe to move address space casts across (writes to the default data capability change the meaning of the current address space), but it's also not safe to reorder loads / stores across that and so we probably need to add an explicit address space change barrier intrinsic for things like that.


================
Comment at: docs/LangRef.rst:9704
+the result back to the original address space should yield the
+original bit pattern).
 
----------------
arsenm wrote:
> theraven wrote:
> > I believe that this holds only if both address spaces are permutations.  Again, the simple 32/64-bit case, a round trip starting in the 32-bit world is always possible, but the converse does not apply.
> This probably needs an additional constraint that the result was also dereferencable / legal as stated before. The goal is it should be legal to insert new addrspacecasts back to the source addrspace in certain contexts. The interesting ones are cases where you are dereferencing the pointer anyway, so it would have been undefined if the cast failed
I think we need to be very careful about eliding user-expected trapping  behaviour here.  CHERI has similar behaviour, but consider a simple 32/64-bit example where:

- 32-bit pointers are AS0
- 64-bit pointers are AS1
- Cast from 32-bit to 64-bit is a zero extension (always valid)
- Cast from 64-bit to 32-bit is a truncation, equality test, and conditional move of 0 on failure, so an addrspacecast from AS1 to AS0 gives either a valid result or null.

If I have a sequence of:

%0 is some AS1 pointer
%1 casts to AS0
%2 loads from %1

This would be expected to trap, but with the text as written it would be valid to elide %1 and have %2 be a load from %0.  This is probably not what the programmer expected.  Would you need to mark one of these as non-integral to achieve those guarantees?  Or have some ordering of permissiveness?  In this scenario, it's fine to do the elision the other way around:

%0 is some AS0 pointer
%1 casts to AS1
%2 loads from %1

In this version, eliding %1 and making %2 load from %0 is totally fine.  I think we really need to model three kinds of cast:

- Ones that are always valid (statically known)
- Ones that are always invalid (statically known)
- Ones that are sometimes valid (dynamically known)

I'd suggest making the last case the default and adding metadata for the first two.  Front ends generally know which of these are the case and can add the metadata and some later analyses may be able to once some more is known about the values.  Optimisers are then free to elide any always-valid addrspacecast instructions.  If the metadata is accidentally dropped then that's also fine.

In the CHERI case, an AS0 to AS200 cast is always valid (and clang could be taught to add the metadata).  An AS200 to AS0 cast is always valid if you can statically prove that it was the result of a cast in the opposite direction, or if it came from a specific allocator.  Most AS0 to AS200 casts are followed by a call to an intrinsic that sets bounds / permissions, and this is the only use of the addrspacecast instruction, which prohibits these optimisations (correctly).




CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63525/new/

https://reviews.llvm.org/D63525





More information about the llvm-commits mailing list