[PATCH] D63525: LangRef: Attempt to formulate some rules for addrspacecast

Thu Jun 20 09:08:11 PDT 2019

arsenm marked an inline comment as done.
arsenm added inline comments.

================
Comment at: docs/LangRef.rst:9704
+the result back to the original address space should yield the
+original bit pattern).

----------------
theraven wrote:
> arsenm wrote:
> > theraven wrote:
> > > I believe that this holds only if both address spaces are permutations.  Again, the simple 32/64-bit case, a round trip starting in the 32-bit world is always possible, but the converse does not apply.
> > This probably needs an additional constraint that the result was also dereferencable / legal as stated before. The goal is it should be legal to insert new addrspacecasts back to the source addrspace in certain contexts. The interesting ones are cases where you are dereferencing the pointer anyway, so it would have been undefined if the cast failed
> I think we need to be very careful about eliding user-expected trapping  behaviour here.  CHERI has similar behaviour, but consider a simple 32/64-bit example where:
> 
> - 32-bit pointers are AS0
> - 64-bit pointers are AS1
> - Cast from 32-bit to 64-bit is a zero extension (always valid)
> - Cast from 64-bit to 32-bit is a truncation, equality test, and conditional move of 0 on failure, so an addrspacecast from AS1 to AS0 gives either a valid result or null.
> 
> If I have a sequence of:
> 
> %0 is some AS1 pointer
> %1 casts to AS0
> %2 loads from %1
> 
> This would be expected to trap, but with the text as written it would be valid to elide %1 and have %2 be a load from %0.  This is probably not what the programmer expected.  Would you need to mark one of these as non-integral to achieve those guarantees?  Or have some ordering of permissiveness?  In this scenario, it's fine to do the elision the other way around:
> 
> %0 is some AS0 pointer
> %1 casts to AS1
> %2 loads from %1
> 
> In this version, eliding %1 and making %2 load from %0 is totally fine.  I think we really need to model three kinds of cast:
> 
> - Ones that are always valid (statically known)
> - Ones that are always invalid (statically known)
> - Ones that are sometimes valid (dynamically known)
> 
> I'd suggest making the last case the default and adding metadata for the first two.  Front ends generally know which of these are the case and can add the metadata and some later analyses may be able to once some more is known about the values.  Optimisers are then free to elide any always-valid addrspacecast instructions.  If the metadata is accidentally dropped then that's also fine.
> 
> In the CHERI case, an AS0 to AS200 cast is always valid (and clang could be taught to add the metadata).  An AS200 to AS0 cast is always valid if you can statically prove that it was the result of a cast in the opposite direction, or if it came from a specific allocator.  Most AS0 to AS200 casts are followed by a call to an intrinsic that sets bounds / permissions, and this is the only use of the addrspacecast instruction, which prohibits these optimisations (correctly).
> 
> 
Loading from the failed cast would be undefined behavior, so transforming the load and avoiding the trap should be legal. I wouldn't expect this to be any different from the compiler eliminating the a trap from a store to NULL. I do think this is an argument for not allowing changing the address space of volatile operations (see D63401).

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63525/new/

https://reviews.llvm.org/D63525