[PATCH] D64128: [CodeGen] Generate llvm.ptrmask instead of inttoptr(and(ptrtoint, C)) if possible.

Wed Jul 3 19:56:59 PDT 2019

hfinkel added a comment.

In D64128#1569817 <https://reviews.llvm.org/D64128#1569817>, @rjmccall wrote:

> The pointer/integer conversion is "implementation-defined", but it's not totally unconstrained.  C notes that "The mapping functions for converting a pointer to an integer or an integer to a pointer are intended to be consistent with the addressing structure of the execution environment.", and we do have to honor that.  The standard allows that "the result ... might not point to an entity of the referenced type", but when in fact it's guaranteed to do so (i.e. it's not just a coincidental result of an implementation decision like the exact address of a global variable — no "guessing"), I do think we have an obligation to make it work.  And on a practical level, there has to be *some* way of playing clever address tricks in the language in order to implement things like allocators and so forth.  So this makes me very antsy.

I don't disagree. But I believe the question is if we have:

  int *x = malloc(4);
  int *y = malloc(4);
  if (x & ~15 == y) {
    *(x & ~15) = 5; // Is this allowed, and if so, must the compiler assume that it might set the value of *y?
  }

I certainly agree that we must allow the implementation of allocators, etc. But allocators, I think, have the opposite problem. They actually have some large underlying objects (from mmap or whatever), and we want the rest of the system to treat some subobjects of these larger objects as though they were independent objects of some given types. From the point of view of the allocator, we have x, and we have `void *memory_pool`, and we need to allow `x & N` to point into `memory_pool`, but because, from the allocator's perspective, we never knew that x didn't point into memory_pool (as, in fact, it likely does), that should be fine (*).

There might be more of an issue, for example, if for a given object, I happen to know that there's some interesting structure at the beginning of its page (or some other boundary). If I also have a pointer to this structure via some other means, then maybe this will cause a problem. This kind of thing certainly falls outside of the C/C++ abstract machine, and I'd lean toward a flag for supporting it (not on by default). I'm assuming that this would be rare. If I'm wrong, then we shouldn't do this by default.

(*) We do have a problem if we inline the implementation of malloc, given how our noalias return attribute works, but that's a preexisting problem, and the malloc implementation should probably be compiled with -fno-builtin-malloc regardless.

> If the general language rules are too permissive for some interesting optimization, it's fine to consider builtins that impose stronger restrictions on their use.

I agree.

Also, and I could be wrong, but my impression is that all of this is extra - this motivating use case requires generating the intrinsic from the code in lib/CodeGen/TargetInfo.cpp - generating it from C/C++ expressions is just a potential additional benefit.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64128/new/

https://reviews.llvm.org/D64128