[PATCH] D41006: [SafepointIRVerifier] Allow non-dereferencing uses of unrelocated or poisoned PHI nodes

Max Kazantsev via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Dec 15 13:12:41 PST 2017


mkazantsev added inline comments.


================
Comment at: lib/IR/SafepointIRVerifier.cpp:250
+/// in a very limited number of situations. Currently the only way to use it is
+/// comparison against constant exclusively derived from null. All limitations
+/// arise due to its undefined state: this pointers should be treated as
----------------
anna wrote:
> DaniilSuchkov wrote:
> > mkazantsev wrote:
> > > DaniilSuchkov wrote:
> > > > mkazantsev wrote:
> > > > > You can always represent any constant pointer as `gep null, %some_int_constant`, so I think that this "exclusively derived from null" stuff is redundant.
> > > > It refers to comment at line 639.
> > > I'm not asking to do something about it in this patch since it was there before, but it is fishy. If I can imagine a VM in which 0xFF is some special magical pointer that cannot be simply compared against normal pointers, then I can also imagine a VM where `gep null, 0xFF` is also some special magical pointer with same properties. Actually, I can define all such special numbers as derivatives from null.
> > > 
> > > From that perspective, how hard-coded constants are different from hard-coded offsets from null?
> > I don't know either, but the idea is to keep this patch consistent with previous code. So I have to maintain the logic around "magic pointer constant". This patch is not about that issue, let's discuss it later.
> I'll try to clarify this. The GC relocates the *base* pointer and this is why we record the base pointer for every 'derived pointer'. After the GC relocates the base pointer at runtime, we can rematerialize the derived pointer because we have stored this information in the IR.
> 
> So, effectively it comes down to always identifying the "base" of a derived pointer. This is where the `getBaseType` comes in.
> When we generate "magic const pointers" in IR (for example, using inttoptr `magic const`), the base here is that magic const. 
> The same idiom is `GEP(null, magic const)`, but here the base pointer is null. Relocating a null is still a null.
> 
> So, this is why we have something like this:
> ```
> %ptr = unrelocated non constant pointer
>  compare (%ptr, inttoptr(magic_const)) <-- can't be reordered before a safepoint
> but:
> compare(%ptr, GEP(null, magic_const)) <-- can be reordered before a safepoint
> ```
>  Also, just as an aside, this is also why inttoptr of addrspace(1) is incorrect in the IRVerifier, but GEP(null, offset) in addrspace(1) is fine. 
Ok, this makes sense to me.


================
Comment at: lib/IR/SafepointIRVerifier.cpp:260
+/// R + R = R
+/// Where "+" - any operation that somehow merge pointers, U - unrelocated,
+/// R - relocated and P - poisoned. Merge with constant doesn't change anything.
----------------
anna wrote:
> DaniilSuchkov wrote:
> > mkazantsev wrote:
> > > DaniilSuchkov wrote:
> > > > mkazantsev wrote:
> > > > > Maybe instead of using the term "merge pointers" stick to the term "derived pointer"? You use both, and I don't catch what is the difference between them.
> > > > By "deriving" I mean f(x) -> x, and by merge f(x1, ..., xN) -> x.
> > > I guess "deriving" is actually f(x1) -> x. And what you call "deriving" is just a particular case of "merging". Again, why do we need both?
> > Because for deriving there is only one rule: it changes nothing and for merge everything is a bit more complicated. Both "gep/bitcast is merge" and "phi/select is deriving" looks misleading.
> > 
> > I agree that formally f(x1) -> x is a particular case of f(x1, ..., xN) -> x, but how to name it so that it won't be confusing?
> I tend to agree with Max here. We really cannot distinguish between both. 
> How about we focus just on the fact that we have multiple sources here? IOW, don't worry about GEPs and bitcasts because they have single source base.
> These GEPs and bitcasts don't change the behaviour of unrelocated/relocated, so they shouldn;t affect this discusson. 
I think that "deriving" is a good name for both, because formally you apply some function `f` (where `f` can be `gep, phi, bitcast` or whatever) to a number of pointer arguments and have a new pointer (derived one) as result. It is unimportant -how- exactly you derived your pointer from your argument(s). You never fail verification while deriving. You can only fail verification when you misuse the derived pointer. And this is the important part.


================
Comment at: lib/IR/SafepointIRVerifier.cpp:242
+/// GC pointer may be in one of three states: relocated, unrelocated and
+/// poisoned.
+/// Relocated pointer may be used without any restrictions.
----------------
anna wrote:
> I don't think we want to introduce one more term 'poisoned' here. Specifically, poisoning has different meanings in the optimizer (and sometimes in the GC), and can be confusing.
> It looks like `poisoned` pointers are just derived pointers which will be *lexically* from multiple pointers. So something like:
> `gep, bitcasts` -> derived pointer from one base
> `phis, selects` -> derived pointers that are lexically from multiple base pointers (I say lexically, because we can have phis/selects that statically have derived from exactly one pointer). 
> 
> Do we really need to make this distinction? I think it's more confusing. Pls see comment below on naming to make clearer.
Yes, this may be confusing with poisoned pointers that are also in GC. I don't have a better idea for its name on top of my head, though. I'm OK to go with whatever you agree with.


================
Comment at: lib/IR/SafepointIRVerifier.cpp:279
+///         \     |
+///      p2 = phi [p] [p1]
+///      p3 = phi [p] [p]
----------------
anna wrote:
> This is incorrect IR. We cannot have multiple *different* incoming phi values from exactly one predecessor.
> With correct IR, I don't think we will have such false positives. If we do have, could you please add a test with FIXME?
I don't get why it is incorrect. For example,

  void *p = &a[10];
  void *p1 = &p[20];
  void *temp = p;
  if (cond) {
    temp = p1;
    some_call();
  }
  void *p2 = temp;

Won't we have exactly this IR here?


https://reviews.llvm.org/D41006





More information about the llvm-commits mailing list