[PATCH] D64128: [CodeGen] Generate llvm.ptrmask instead of inttoptr(and(ptrtoint, C)) if possible.

John McCall via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Jul 4 11:53:17 PDT 2019


rjmccall added a comment.

In D64128#1569836 <https://reviews.llvm.org/D64128#1569836>, @hfinkel wrote:

> In D64128#1569817 <https://reviews.llvm.org/D64128#1569817>, @rjmccall wrote:
>
> > The pointer/integer conversion is "implementation-defined", but it's not totally unconstrained.  C notes that "The mapping functions for converting a pointer to an integer or an integer to a pointer are intended to be consistent with the addressing structure of the execution environment.", and we do have to honor that.  The standard allows that "the result ... might not point to an entity of the referenced type", but when in fact it's guaranteed to do so (i.e. it's not just a coincidental result of an implementation decision like the exact address of a global variable — no "guessing"), I do think we have an obligation to make it work.  And on a practical level, there has to be *some* way of playing clever address tricks in the language in order to implement things like allocators and so forth.  So this makes me very antsy.
>
>
> I don't disagree. But I believe the question is if we have:
>
>   int *x = malloc(4);
>   int *y = malloc(4);
>   if (x & ~15 == y) {
>     *(x & ~15) = 5; // Is this allowed, and if so, must the compiler assume that it might set the value of *y?
>   }
>   
>
> I certainly agree that we must allow the implementation of allocators, etc. But allocators, I think, have the opposite problem. They actually have some large underlying objects (from mmap or whatever), and we want the rest of the system to treat some subobjects of these larger objects as though they were independent objects of some given types. From the point of view of the allocator, we have x, and we have `void *memory_pool`, and we need to allow `x & N` to point into `memory_pool`, but because, from the allocator's perspective, we never knew that x didn't point into memory_pool (as, in fact, it likely does), that should be fine (*).
>
> There might be more of an issue, for example, if for a given object, I happen to know that there's some interesting structure at the beginning of its page (or some other boundary).


This is what I was thinking about for allocators; this is a common implementation technique for `free` / `realloc` / `malloc_size`.

> If I also have a pointer to this structure via some other means, then maybe this will cause a problem. This kind of thing certainly falls outside of the C/C++ abstract machine, and I'd lean toward a flag for supporting it (not on by default).

If you mean a theoretical minimal C abstract machine that does not correspond to an actual target and is therefore not bound by any of the statements in the C standard that say things like "this is expected to have its obvious translation on the target", then yes, I completely agree.  If you're talking about the actual C programming language that does correspond to actual targets, then it's not clear at all that it's outside the C abstract machine, because AFAICT integer-pointer conversions are (1) well-specified on specific targets by this de facto requirement of corresponding directly to pointer representations and (2) well-behaved as long as the integer does correspond to the address of an actual object of that type.

Also, please understand that compiler writers have been telling our users for decades that (1) pointer arithmetic is subject to some restrictions on penalty of UB and (2) they can avoid those restrictions by using pointer-integer conversions and doing integer arithmetic instead.  So any proposal to weaken the latter as a workaround makes me very worried, especially if it's also enforcing alignment restrictions that we've generally chosen not to enforce when separated from actual memory accesses.

> Also, and I could be wrong, but my impression is that all of this is extra - this motivating use case requires generating the intrinsic from the code in lib/CodeGen/TargetInfo.cpp - generating it from C/C++ expressions is just a potential additional benefit.

I agree that we could use this intrinsic there safely, with the "object" being the variadic arguments area of the original va_list.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64128/new/

https://reviews.llvm.org/D64128





More information about the cfe-commits mailing list