[PATCH] D17993: [CodeGen] Apply 'nonnull' and 'dereferenceable(N)' to 'this' pointer arguments.

Wed Nov 18 11:19:58 PST 2020

rjmccall added inline comments.

================
Comment at: clang/lib/CodeGen/CGCall.cpp:2169
+    if (!CodeGenOpts.NullPointerIsValid &&
+        getContext().getTargetAddressSpace(FI.arg_begin()->type) == 0) {
+      Attrs.addAttribute(llvm::Attribute::NonNull);
----------------
arichardson wrote:
> rjmccall wrote:
> > rsmith wrote:
> > > jdoerfert wrote:
> > > > rjmccall wrote:
> > > > > rsmith wrote:
> > > > > > jdoerfert wrote:
> > > > > > > arichardson wrote:
> > > > > > > > Isn't the `this` pointer also nonnull in other address spaces?
> > > > > > > > 
> > > > > > > > In our CHERI fork we use AS200 for the this pointer and would quite like to have the nonnull attribute.
> > > > > > > > I can obviously change this line locally when I next merge from upstream, but I would like to avoid diffs and it seems to me like this restriction is unnecessary.
> > > > > > > I also think `NullPointerIsValid` is sufficient. 
> > > > > > It's my understanding that:
> > > > > > * The LLVM `null` value in any address space is the all-zero-bits value.
> > > > > > * In address space zero, the `null` value does not correspond to addressable memory, but this is not assumed to hold in other address spaces.
> > > > > > * An address-space-zero `null` value that is addressspacecast to a different address space might not be the `null` in the target address space.
> > > > > > * The `nonnull` attribute implies that the pointer value is not the `null` value.
> > > > > > * A null pointer in the frontend in a non-zero address space corresponds to the value produced by an addressspacecast of an address-space-zero `null` value to the target address space.
> > > > > > 
> > > > > > That being the case, there is simply no connection between the C and C++ notion of a null pointer and a `null` LLVM pointer value in a non-zero address space in general, so it is not correct to use the `nonnull` attribute in a non-zero address space in general. Only if we know that a C++ null pointer is actually represented by the LLVM `null` value in the corresponding address space can we use the `nonnull` attribute to expose that fact to LLVM. And we do not know that in general.
> > > > > I think all of this is correct except that the frontend does know what the bit-pattern of the null pointer is in any particular language-level address space, and it knows what the language-level address space of `this` is.  So we should be able to ask whether the null value in the `this` address space is the all-zero value and use that to decide whether to apply `nonnull`.
> > > > Hm, I think the problem is that we don't couple `NullPointerIsValid` with the address space. As I said before. In LLVM-IR, if we don't have the `null-pointer-is-valid` property, then "memory access" implies `dereferenceable` implies `nonnull`. Now, we usually assume non-0 address spaces to have the above property, but that should be decoupled. The query if "null is valid" should take the function and the AS, as it does conceptually in LLVM-core, and then decide if `null-pointer-is-valid` or not. If we had that, @arichardson could define that AS200 does not have valid null pointers. If you do that right now the IR passes will at least deduce `nonnull` based on `derferenceable`.
> > > > I think all of this is correct except that the frontend does know what the bit-pattern of the null pointer is in any particular language-level address space
> > > 
> > > Interesting. How do we get at that? Do we ask the middle-end to constant-fold `addrspacecast i8* null to i8* addrspace(N)` and see if we get a `null` back, or is there a better way?
> > > 
> > > In any case, this patch follows the same pattern we use for return values of reference type, parameters of reference type, and decayed array function parameters with static array bounds, all of which apply `nonnull` only in address space 0. If we want to use a different pattern, to consider whether LLVM's `nonnull` means "not a null pointer" rather than assuming that holds only in address space 0, we should make a holistic change for that throughout CGCall.cpp, rather than touching only the handling of the `this` pointer.
> > > 
> > > It'd also be interesting to consider what we want `__attribute__((nonnull))` to mean in address spaces where the null pointer isn't the zero pointer: should it mean non-zero at the source level / non-null in LLVM IR, or should it mean non-null at the source level (which might be unrepresentable in LLVM IR, but static analysis etc. could still detect it)? We're currently inconsistent on this: static analysis, constant evaluation, and sanitizers treat the attribute as meaning non-null, but IR generation emits the `nonnull` IR attribute, treating it as meaning non-zero instead.
> > > Interesting. How do we get at that? Do we ask the middle-end to constant-fold addrspacecast i8* null to i8* addrspace(N) and see if we get a null back, or is there a better way?
> > 
> > In fact, the middle-end has no ability to constant-fold `addrspacecast`, because LLVM's address-space support is a giant pile of hacks rather than exhibiting any cohesive design principles; or to put it another way, the lack of any centralized technical design authority in LLVM means that nobody has been able to enforce any rules about, say, targets being able to contribute knowledge about address spaces to the middle-end.
> > 
> > But in the frontend, we essentially have our own independent concept of address spaces, and the mapping from an AST address space to an IR address space can be non-obvious (and, in particular, non-injective), targets can perform arbitrary IR for address-space conversions instead of just using `addrspacecast`, and so on.  This is necessary because we have to actually ship a functional compiler, whether with LLVM or despite it.  And one of the places where it's necessary to work around LLVM is emitting a null pointer constant, because of course C requires a null pointer constant to be a constant expression, which in LLVM means we have to produce an `llvm::Constant*` for it, and we don't get to rely on assumptions like `addrspacecast` having appropriate semantics.  So instead we just have `ASTContext::getTargetNullPointerValue`, which is ultimately fed by the TargetInfo, and we expect that to be consistent with the behavior of e.g. the address-space-conversion customization point.
> > 
> > I think `__attribute__((nonnull))` clearly needs to be based on the language-level null pointer concept rather than having anything to do with a zero bit representation, and if that means we can't use it in IR, maybe we can improve IR.
> > > I think all of this is correct except that the frontend does know what the bit-pattern of the null pointer is in any particular language-level address space
> > 
> > Interesting. How do we get at that? Do we ask the middle-end to constant-fold `addrspacecast i8* null to i8* addrspace(N)` and see if we get a `null` back, or is there a better way?
> > 
> We try rather hard to avoid any addrspacecasts in the IR and emit `i8* addrspace(200) null` for language-level NULL pointers.
> 
> A bit more context: In our compiler we support two compilation modes: so-called "pure-capability" where we use AS 200 for all pointers. This maps to using capability instructions in the backend (effectively 128-bit fat pointers with a hidden validity tag). We also support a hybrid compilation mode where we emit everything in AS0 (64-bit integer pointers) by default and only use AS200 for annotated language-level pointers.
> In both of these address spaces a zero bit-pattern is NULL and not dereferenceable. 
> In the "pure-capability" mode our AS200 is effectively the default AS0. However, we can't use that since we need to use a different AS to select our instructions in the backend since we extend the MIPS,RISC-V and AArch64 backends and AS0 already maps to integer loads/stores/etc.
> 
> So the problem as I see it is that clang needs to be able to tell that a for a given LLVM IR address space llvm IR `null` (i.e. all zero bits) maps to C/C++ language-level `NULL/nullptr`?
> 
> However, I did not see anywhere in the langref that `null` is equivalent to "all-zero-bits", but I may have missed that part.
> All I saw about null was:
> 
> > Null pointer constants
> > The identifier ‘null’ is recognized as a null pointer constant and must be of pointer type.
> 
> So unless I missed something it seems like AS<N> could define null as being all-ones instead, and that could then map to the representation of language-level NULL on machines where it is not all-zeroes?
> 
> 
> In our fork I added a local hack to replace the checks for adding the nonnull attribute if AS==0 with a `canMarkAsNonNull` helper that checks for `AS==0||AS==200`. Maybe there should be a helper function that checks whether language-level NULL maps to null in address space N, but that would probably require making null-pointer-is-valid a per-AS property?
> 
> 
LLVM's assumption that `null` (`ConstantPointerNull`)  is a zero bit-pattern is reflected in places like `isNullValue()`, which (despite the name) actually determines whether the value is a zero bit-pattern, as you can see from how it operates on `ConstantFP` and so on.

But if your null pointers are actually the zero bit-pattern across all address spaces, I don't think that would cause you any problems.

We should be able to just replace this check with `getContext().getTargetNullPointerValue(FI.arg_begin()->type) == 0`.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D17993/new/

https://reviews.llvm.org/D17993