[PATCH] D93376: [LangRef] Clarify the semantics of lifetime intrinsics

Tue Dec 29 14:15:37 PST 2020

jdoerfert added a comment.

I believe the problem is that we use the address/allocation "rules" to fold comparisons while we later "change" those rules. This patch tries to define a way out by differentiating allocas. I'm not sure anymore that is a good idea to begin with. Consistency could also be achieved by not using the "rules" to fold alloca comparisons to "not equal". We could instead coalesce allocas in the middle end (more aggressively and not only based on lifetime markers) in order to fold comparisons of the "then equal" allocas to "equal". That would not require us to assign different semantics to allocas depending on their uses, restrict lifetime markers, and, probably most importantly, allows us to coalesce more without creating inconsistencies. First step is to check how often we fold alloca comparisons to "not equal" at all. If that doesn't happen at all there seems little point in all of this. Even if not, an alternative, as sketched before, might very well perform at least as good while improving other aspects.

In D93376#2464019 <https://reviews.llvm.org/D93376#2464019>, @aqjune wrote:

> Are existing heap optimizations checking whether a malloc is used by lifetime.start/end?

Not in tree, though I sketched one that would a few posts ago.

> I see that `dereferenceable_or_null(n)` is always attached to `call i8* malloc(<n>)`. If lifetime.start/end can take a heap pointer, its lifetime may start way after this call, so malloc isn't immediately dereferenceable even if it isn't null.
> We can change the semantics of `dereferenceable_or_null`, but this can transitively require changes to other attributes or dereferenceability analyses.

We don't need to change `dereferenceable_or_null`. The LangRef *without* this patch was fine with the meaning and lifetime on malloced pointers.
The need to adjust something with this patch has yet to be determined.

> In D93376#2463188 <https://reviews.llvm.org/D93376#2463188>, @jdoerfert wrote:
>
>> Sure. You could do the same transformation with heap objects under the same constraints. So you go from 
>> ... (snip)
>
> I think there are more cases to consider, such as:
>
>   p = malloc(1)
>   lifetime.start(p)
>   lifetime.end(p)
>   q = malloc(1)
>   // Can the addresses of p and q overlap?  [%c = icmp eq %p, %q]
>   free(q)
>   lifetime.start(p)
>   lifetime.end(p)
>   free(p)

So the `%c` I added can be either `true` or `false`. `false` is probably clear, `true` becomes clear
if you argue that the pointer is dead outside it's lifetime and we could allocate/deallocate it "on demand".
FWIW, I look at this the same way as I look at the alloca case (below), after all, an alloca is a malloc
from a different "pile" with an implicit free at the function exit.

  p = alloca
  lifetime.start(p)
  lifetime.end(p)
  q = alloca
  // Can the addresses of p and q overlap?  [%c = icmp eq %p, %q]
  lifetime.start(p)
  lifetime.end(p)
  // implicit pop/free(q)
  // implicit pop/free(p)

In both cases you need to be consistent though. The one problem with consistency I see is that you can drop
lifetime markers later, I would assume.

> That is why I still prefer memset(undef); it may be a churn to teach that memset(undef) is a canonicalized form to e.g., loop dependence analysis, but teaching lifetime also needs churn.
> Most importantly, memset's semantics is crystal clear; it doesn't change the object's address. It also naturally supports a thing that is analogous to 'partial lifetime start'.

IMHO, lifetime markers should not change the object's address either. While there is a pass using lifetime markers to build life ranges to coalesce objects, that is, in principle, unrelated to lifetime markers. You could implement the same logic without lifetime markers (in the middle or backend). That said, the problem of consistency is then about the fact that we reason about the pointer addresses "before" we "fix" them. Maybe we should not do that instead. That would really open up new possibilities to coalesce stack locations. Related: Has anyone checked how much we benefit from alloca equality folding?

>>> It's because it can introduce UB. Consider this example:
>>>
>>>   p = alloca i32
>>>   store 0, p // analysis will say this is perfectly fine
>>>   f(p) // lifetime.start(p) is hidden in f(p)
>>>
>>> After inlining:
>>>
>>>   p = alloca i32
>>>   store 0, p
>>>   lifetime.start(p) // this makes the above store raise UB 
>>
>> Given the proposed semantic of lifetime.start, the program had UB semantics prior to inlining, we might not have known it statically but that is not an issue.
>
> But it would be great to say that alloca is simply dereferenceable as far as it isn't used by any lifetime.start/end, isn't it?
> We don't want to assume that a function call may affect lifetime of escaped local variables, and IMO this is what LLVM assumes as well.

I get the feeling this mixes pre-patch and post-patch semantic, I might be wrong though. Here is my take:
Pre-patch, this is perfectly legal and the alloca is dereferenceable. The store is dead, as per "implicit lifetime.end marker that makes the alloca dead until lifetime start". So, pre-patch, no UB and a derefereceable alloca, agreed?
Post-patch the pre-inline IR is not legal anymore and the post-inline IR contains UB. That is arguably different.

---

I played around with lifetime markers again and went down a rabbit hole (https://gcc.godbolt.org/z/cEbqds *). Now I'm not sure anymore if this patch was supposed to fix this issue or not. Afaict, the stack locations are not accessed but their address is taken. I'm confused now.

- I started with source #1, got IR, made this source #2 and commented some lifetime markers (doesn't matter but just to show), run opt with a few passes, made the result source #3, running llc gets us to the inconsistent result. Interestingly, it seems we do not fold escaping allocas: https://gcc.godbolt.org/z/75sMfY

In the beginning, I wanted to create an example where we fold a comparison with an alloca *before* we inline it and create lifetime markers that are then used to coalesce the stack location with another one we used in the comparison before.

---

In case this patch goes in after all, maybe make lifetime start and end behave consistently, e.g.,

  lifetime.start(p)
  lifetime.start(p)  // ill-formed

  lifetime.end(q)
  lifetime.end(q)    // no-op

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93376/new/

https://reviews.llvm.org/D93376