[PATCH] D64128: [CodeGen] Generate llvm.ptrmask instead of inttoptr(and(ptrtoint, C)) if possible.

Hal Finkel via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Jul 5 19:22:36 PDT 2019


hfinkel added a comment.

In D64128#1570609 <https://reviews.llvm.org/D64128#1570609>, @rjmccall wrote:

> In D64128#1569836 <https://reviews.llvm.org/D64128#1569836>, @hfinkel wrote:
>
> > In D64128#1569817 <https://reviews.llvm.org/D64128#1569817>, @rjmccall wrote:
> >
> > > The pointer/integer conversion is "implementation-defined", but it's not totally unconstrained.  C notes that "The mapping functions for converting a pointer to an integer or an integer to a pointer are intended to be consistent with the addressing structure of the execution environment.", and we do have to honor that.  The standard allows that "the result ... might not point to an entity of the referenced type", but when in fact it's guaranteed to do so (i.e. it's not just a coincidental result of an implementation decision like the exact address of a global variable — no "guessing"), I do think we have an obligation to make it work.  And on a practical level, there has to be *some* way of playing clever address tricks in the language in order to implement things like allocators and so forth.  So this makes me very antsy.
> >
> >
> > I don't disagree. But I believe the question is if we have:
> >
> >   int *x = malloc(4);
> >   int *y = malloc(4);
> >   if (x & ~15 == y) {
> >     *(x & ~15) = 5; // Is this allowed, and if so, must the compiler assume that it might set the value of *y?
> >   }
> >   
> >
> > I certainly agree that we must allow the implementation of allocators, etc. But allocators, I think, have the opposite problem. They actually have some large underlying objects (from mmap or whatever), and we want the rest of the system to treat some subobjects of these larger objects as though they were independent objects of some given types. From the point of view of the allocator, we have x, and we have `void *memory_pool`, and we need to allow `x & N` to point into `memory_pool`, but because, from the allocator's perspective, we never knew that x didn't point into memory_pool (as, in fact, it likely does), that should be fine (*).
> >
> > There might be more of an issue, for example, if for a given object, I happen to know that there's some interesting structure at the beginning of its page (or some other boundary).
>
>
> This is what I was thinking about for allocators; this is a common implementation technique for `free` / `realloc` / `malloc_size`.
>
> > If I also have a pointer to this structure via some other means, then maybe this will cause a problem. This kind of thing certainly falls outside of the C/C++ abstract machine, and I'd lean toward a flag for supporting it (not on by default).
>
> If you mean a theoretical minimal C abstract machine that does not correspond to an actual target and is therefore not bound by any of the statements in the C standard that say things like "this is expected to have its obvious translation on the target", then yes, I completely agree.  If you're talking about the actual C programming language that does correspond to actual targets, then it's not clear at all that it's outside the C abstract machine, because AFAICT integer-pointer conversions are (1) well-specified on specific targets by this de facto requirement of corresponding directly to pointer representations and (2) well-behaved as long as the integer does correspond to the address of an actual object of that type.
>
> Also, please understand that compiler writers have been telling our users for decades that (1) pointer arithmetic is subject to some restrictions on penalty of UB and (2) they can avoid those restrictions by using pointer-integer conversions and doing integer arithmetic instead.


Indeed, we have.

>   So any proposal to weaken the latter as a workaround makes me very worried, especially if it's also enforcing alignment restrictions that we've generally chosen not to enforce when separated from actual memory accesses.

I *think* that what @fhahn points out about the masking restrictions addresses the concerns that we were discussing: My understanding of the potentially-problematic cases here are when the masking could get you from one variable from one underlying object (that you might access) to another variable in a different underlying object (that you might also access), and given the masking restrictions, this is impossible (*): because the high bits can't be used for addressing, and for the lower bits, these fall within the alignment of the type, and so in order for that to move you between underlying objects, the objects would need to be smaller than the type alignment.

(*) I suppose one can arrange for this to break something given some system-specific setup:

  // note: the linker script ensures that these are packed and adjacent.
  short a;
  struct {
    short b, c;
  } s;
  
  ...
  int *ib = (int *) s.b;
  int *ia = (int *) (((size_t) ib) & ~1);
   a = 6;
  (*(short *) ia) = 5;
  return a; // oops, aliasing will now enable us to return 6 here.


The probability that someone is doing that seems low, because they need to be doing the masking and casting with a type with larger alignment requirements than the type they'll actually access.

On the other hand, using the otherwise-unused bits in the pointer to store information, and then mask to get the pointer back, is pretty common. In any case, I recommend that we add a "-f" flag for this.

> 
> 
>> Also, and I could be wrong, but my impression is that all of this is extra - this motivating use case requires generating the intrinsic from the code in lib/CodeGen/TargetInfo.cpp - generating it from C/C++ expressions is just a potential additional benefit.
> 
> I agree that we could use this intrinsic there safely, with the "object" being the variadic arguments area of the original va_list.




Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64128/new/

https://reviews.llvm.org/D64128





More information about the llvm-commits mailing list