[llvm-dev] lifetime.start/end

Mon Jan 25 20:33:30 PST 2021

On 1/24/21 10:07 AM, Ralf Jung wrote:
> Hi Johannes,
>
>> At some point I lost track of what you even want to achieve.
>
> *lol* fair.^^
> All I want is a precise and meaningful spec for lifetime.start and 
> lifetime.end. That's where we entered this rabbit hole. You didn't 
> like Juneyoung's and Nuno's proposal for this and instead suggested to 
> replace the entire system by something else, and that new system is 
> what we are discussing now (that is my understanding, anyway).
>
I went back to the proposal in question, 
https://reviews.llvm.org/D93376, and I could not right away determine 
why this would avoid the inconsistency we can imagine for something like

```
int *G = 0;
void bar(int *p) __noinline__ { if (G) print(G == p); else G = p;
void foo() {
   intptr_t xp, yp;
   { int x; xp = &x; bar(&x); }
   { int y; yp = &y; bar(&y); }
   print(xp == yp);               // One can imagine this is folded to 
false and because x and y are coalesced bar will print true.
}

```

I was opposing the proposal because of the brittle syntactic 
restrictions. In my attempt to justify not having those I also tried to 
resolve the above inconsistency in a more general manner because I 
thought that was what it was all about. I believe that we either have to 
argue the address of a dead object is unusable/undetermined, basically 
the direction of James comment, or we need to decouple lifetime markers 
from locations.

>> Your initial problem goes away if we only coalesce and fold 
>> comparisons between allocations for which the addresses are not 
>> observed.
>> Let's do that and thereby benefit from all the good things we can get 
>> from lifetime markers and coalescing, for alloca/malloc/...
>> Let's not go down the brittle road of syntactic limitations. It 
>> doesn't allow us to do a lot of things and will come with a price as 
>> we continue.
>>
>> FWIW, I think some part of the misconception is that two allocas 
>> "will definitely get different addresses" which you argue "heavily 
>> restricts the possible allocation choices compared to the original C 
>> program".
>> The addresses are different but that is only interesting if you 
>> observe them. Take the C code: `{int a,b, c = (&a == &b); 
>> use(a,c);}`. I can happily coalesce `a` and `b` after the comparison 
>> was folded, or `b` and `c` even if the comparison is still in place. 
>> For neither transformation I need
>> lifetime markers to do it in IR.
>
> Sure, everything in the semantics is up to what the program actually 
> observes. But I was considering the general case, in which it will not 
> be possible to prove that some address is not observed. I expect that 
> to be quite common (certainly, Rust programs really like to take 
> references to local variables and pass them to other functions), and 
> in those cases, not having lifetime.start/end will severely restrict 
> program transformations -- as you observed, even outlining is affected.
>
> This is basically the abstract version of the concrete concern that 
> James raised.

I get the argument that it can hinder optimizations but it is abstract. 
It seems pretty simple to test this out, I mean to restrict the folding 
and the coalescing. The benefit, which I think got ignored here a lot, 
is that it provides a clear path towards deduction of coalescing 
opportunities, that is, coalescing in the middle-end or backend w/o 
lifetime markers. Though, similar to the impact of my proposal, I have 
no idea if that is even needed.

~ Johannes

>
> Kind regards,
> Ralf
>
>> A lot of my arguments around alloca/malloc/... is that we can augment 
>> the information from the frontend by determining if an address is 
>> observed or not. If it is not, we can limit its allocation, coalesce 
>> it, etc. Lifetime markers can help us do this but they are not the 
>> only way.
>> So, basically, we do not need lifetime markers to coalesce 
>> alloca/malloc/... but that they can help.
>
>
>
>>
>> Let's try to get out of this rabbit hole. If you have a problem with 
>> the proposal as outlined in 
>> https://lists.llvm.org/pipermail/llvm-dev/2021-January/147594.html , 
>> please let me know. I assume dealing with observed addresses 
>> different than with non-observede ones (implied by noescape) is the 
>> way forward now.
>>
>> ~ Johannes
>>
>>
>>
>>>
>>> Kind regards,
>>> Ralf
>>>
>>>
>>>>
>>>>
>>>>>
>>>>> Kind regards,
>>>>> Ralf
>>>>>
>>>>>>
>>>>>> Does that make some sense?
>>>>>>
>>>>>> ~ Johannes
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Kind regards,
>>>>>>> Ralf
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 1/7/21 10:12 PM, Juneyoung Lee wrote:
>>>>>>>>> On Fri, Jan 8, 2021 at 8:54 AM Johannes Doerfert 
>>>>>>>>> <johannesdoerfert at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> On 1/7/21 4:55 PM, Juneyoung Lee wrote:
>>>>>>>>>>> It is, alloca must have its integral address decided at its 
>>>>>>>>>>> *allocation*
>>>>>>>>>>> time as well, like 0xFF00... .
>>>>>>>>>>>
>>>>>>>>>>> For example, if 0xFFFF0000 ~ 0xFF000000 is already 
>>>>>>>>>>> allocated, the new
>>>>>>>>>>> alloca cannot be placed there unless it is not observed in 
>>>>>>>>>>> the future,
>>>>>>>>>>> according to your proposal.
>>>>>>>>>>>
>>>>>>>>>>> But how do we know from the current state whether the stack 
>>>>>>>>>>> variable is
>>>>>>>>>>> going to be observed or not?
>>>>>>>>>> With that argument you could allocate all but 4 bytes, do a 
>>>>>>>>>> malloc(4)
>>>>>>>>>> and know the address of the returned pointer (assuming it is 
>>>>>>>>>> not null).
>>>>>>>>>>
>>>>>>>>>> What I try to say is, either your scenario is part of the 
>>>>>>>>>> model and
>>>>>>>>>> everything we do is broken as you could "observe" addresses 
>>>>>>>>>> passively,
>>>>>>>>>> *or*, the abstract machine we use for semantic reasoning 
>>>>>>>>>> doesn't permit
>>>>>>>>>> the above reasoning. I really hope for the latter.
>>>>>>>>>>
>>>>>>>>> That's a very valid point..!
>>>>>>>>>
>>>>>>>>> Actually, I have a memory model that addresses this problem.
>>>>>>>>> The gist is that we can interpret each allocation instruction 
>>>>>>>>> as *creating
>>>>>>>>> 2 blocks* and nondeterministically returning one of them.
>>>>>>>>> The other one is marked as invalid but not freed immediately. 
>>>>>>>>> Deallocation
>>>>>>>>> frees both blocks.
>>>>>>>>> If there is not enough space for having two blocks, it is 
>>>>>>>>> treated as
>>>>>>>>> out-of-memory.
>>>>>>>>>
>>>>>>>>> It makes guessing the address of an allocation according to 
>>>>>>>>> memory layout
>>>>>>>>> invalid UB.
>>>>>>>>>
>>>>>>>>> p = malloc(4)
>>>>>>>>> // If p != null, it is guaranteed there was at least two 4 
>>>>>>>>> byte slots
>>>>>>>>> available, say 0x100~0x104 and 0x110~0x114.
>>>>>>>>> // Two blocks are allocated at 0x110 and 0x114, and one of the 
>>>>>>>>> addresses is
>>>>>>>>> nondeterministically returned.
>>>>>>>>> // By the nature of nondeterminism, all executions below 
>>>>>>>>> should be
>>>>>>>>> well-defined regardless of the address of p.
>>>>>>>>> *(int*)0x100 = 10; // In an execution where p is 0x110, this 
>>>>>>>>> raises UB.
>>>>>>>>>
>>>>>>>>> The paper link is here 
>>>>>>>>> <https://dl.acm.org/doi/10.1145/3276495> FYI.
>>>>>>>>
>>>>>>>> Nice! Thanks for the link :)
>>>>>>>>
>>>>>>>>
>>>>>>>>> To argue differently: Who is to say there is a stack, or only 
>>>>>>>>> one,
>>>>>>>>>> or that alloca allocates memory "on the one stack"? That is 
>>>>>>>>>> not part of
>>>>>>>>>> the IR, IMHO. I can write a machine on which alloca lowers to 
>>>>>>>>>> malloc,
>>>>>>>>>> I don't even need the free during stack unwind but that I 
>>>>>>>>>> could do as
>>>>>>>>>> well if I wanted to.
>>>>>>>>>
>>>>>>>>> This is right, the example was for illustrative purposes. IR 
>>>>>>>>> does not
>>>>>>>>> enforce alloca to be placed at 'stack'.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>> Making icmp/ptrtoint yield poison will still make loop 
>>>>>>>>>>> versioning or
>>>>>>>>>>> pointer rewriting transformations unsound because these 
>>>>>>>>>>> operations now
>>>>>>>>>> can
>>>>>>>>>>> create poison (even if pointers are noundef).
>>>>>>>>>> I did not say they yield poison, at least I did not try to 
>>>>>>>>>> say that.
>>>>>>>>>> What are you referring to exactly?
>>>>>>>>>>
>>>>>>>>> I was referring to this:
>>>>>>>>>
>>>>>>>>>>> But this makes ptrtoint/icmp make UB-raising instructions, 
>>>>>>>>>>> which
>>>>>>>>>>> contradicts with what LLVM does.
>>>>>>>>>> As with other violation of attributes I would, on first 
>>>>>>>>>> though, suggest
>>>>>>>>>> to produce poison, not UB.
>>>>>>>>> But it is more about the (imaginary) attribute, so maybe I was 
>>>>>>>>> slightly out
>>>>>>>>> of topic.
>>>>>>>>>
>>>>>>>>>> 2) is fine, I think the suggestion semantically makes sense 
>>>>>>>>>> perfectly. 1)
>>>>>>>>>>> is something I'm concerned about now.
>>>>>>>>>>>
>>>>>>>>>>> There are more than pointer foldings, such as rewriting such 
>>>>>>>>>>> expression,
>>>>>>>>>>> code motion ptr cmp, introduce ptr cmp, etc. There is also 
>>>>>>>>>>> analysis
>>>>>>>>>> relying
>>>>>>>>>>> on ptr cmp.
>>>>>>>>>>> Defining the correctness of each of them is something we 
>>>>>>>>>>> want to avoid,
>>>>>>>>>> and
>>>>>>>>>>> maybe that's why we want to define precise semantics for 
>>>>>>>>>>> things.
>>>>>>>>>> I don't get the point. My proposal does not change the 
>>>>>>>>>> semantics of
>>>>>>>>>> pointer comparisons, at all. I explicitly mentioned that in 
>>>>>>>>>> the last
>>>>>>>>>> email.
>>>>>>>>>>
>>>>>>>>> Oh okay, I thought it was a part of the lifetime proposal, but 
>>>>>>>>> it seems
>>>>>>>>> more like a separate thing.
>>>>>>>>> I agree that this requires performance impact.
>>>>>>>>> Also investigation of existing transformations would be 
>>>>>>>>> needed; Alive2's
>>>>>>>>> pointer comparison is doing approximation yet, but if it 
>>>>>>>>> becomes fully
>>>>>>>>> precise, it will show something from running LLVM unit tests I 
>>>>>>>>> believe..! :)
>>>>>>>>>
>>>>>>>>>> I think this will be less aggressive and may give nice 
>>>>>>>>>> feedback to
>>>>>>>>>>> potential projects that are using lifetime with non-alloca.
>>>>>>>>>> The lifetime marker debate, basically 2) above, is orthogonal 
>>>>>>>>>> to the
>>>>>>>>>> problem
>>>>>>>>>> you try to solve. It got mixed in as lifetime markers were 
>>>>>>>>>> used by
>>>>>>>>>> StackColoring
>>>>>>>>>> to perform coalescing but that is coincidental. You can 
>>>>>>>>>> (arguably)
>>>>>>>>>> coalesce stack
>>>>>>>>>> allocations regardless of lifetime markers and with 1) such a
>>>>>>>>>> transformation
>>>>>>>>>> (w/ and w/o lifetime markers) would actually be sound.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> At the end, we can implement IR writer that lowers lifetime 
>>>>>>>>>>> with
>>>>>>>>>> non-alloca
>>>>>>>>>>> into memset(undef). WDYT?
>>>>>>>>>> Yeah, 2) is orthogonal and we can lower it that way. Unsure 
>>>>>>>>>> if it is
>>>>>>>>>> helpful
>>>>>>>>>> but we can certainly define it that way in the LangRef.
>>>>>>>>>>
>>>>>>>>> Okay, thanks!
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> ~ Johannes
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Juneyoung
>>>>>>>>>>>
>>>>>>>>>>> p.s. The reply was late, sorry. I think I can spend more 
>>>>>>>>>>> time on this
>>>>>>>>>> today.
>>>>>>>>>>> On Thu, Jan 7, 2021 at 9:02 AM Johannes Doerfert <
>>>>>>>>>> johannesdoerfert at gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> On 1/6/21 4:33 PM, Juneyoung Lee wrote:
>>>>>>>>>>>>>>> Stepwisely defining the semantics of instructions is a 
>>>>>>>>>>>>>>> desirable
>>>>>>>>>>>>> direction
>>>>>>>>>>>>>>> IMO.
>>>>>>>>>>>>>> I'm confused. What in the proposal would prevent us from 
>>>>>>>>>>>>>> defining
>>>>>>>>>>>>>> the semantics of instructions, or force us to do it in an 
>>>>>>>>>>>>>> "undesirable
>>>>>>>>>>>>> way"?
>>>>>>>>>>>>>
>>>>>>>>>>>>> I meant it would be great if the output state after 
>>>>>>>>>>>>> executing an
>>>>>>>>>>>>> instruction can be described using its input state.
>>>>>>>>>>>>> (that was the meaning of 'stepwise semantics', I should 
>>>>>>>>>>>>> have been more
>>>>>>>>>>>>> clear about this)
>>>>>>>>>>>>>
>>>>>>>>>>>>> For example, the semantics of 'z = add x y' can be defined 
>>>>>>>>>>>>> as follows:
>>>>>>>>>>>>> Given an input state s, next state s' = s[z -> s(x) + s(y)]
>>>>>>>>>>>>> where s(x) is the value of x in the previous state, and 
>>>>>>>>>>>>> s[z -> v] is a
>>>>>>>>>>>>> state with z updated to v.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Another example that involves memory: the semantics of 'i 
>>>>>>>>>>>>> = ptrtoint p'
>>>>>>>>>>>> can
>>>>>>>>>>>>> be defined as follows:
>>>>>>>>>>>>> Given an input state s, next state s' = s[i -> 
>>>>>>>>>>>>> s(p).obj.address +
>>>>>>>>>>>>> s(p).offset]
>>>>>>>>>>>>> where obj.address is the begin address of a memory object 
>>>>>>>>>>>>> obj pointed
>>>>>>>>>> by
>>>>>>>>>>>> p
>>>>>>>>>>>>> & offset is p's byte offset. (Imagine a pointer to the 
>>>>>>>>>>>>> offset of some
>>>>>>>>>>>> char
>>>>>>>>>>>>> array).
>>>>>>>>>>>>> Note that ptrtoint & add can be nicely defined w.r.t the 
>>>>>>>>>>>>> input state.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Now, the instruction that we're facing is 'p = alloca'.
>>>>>>>>>>>>> To describe the output state after executing 'p = alloca', 
>>>>>>>>>>>>> the address
>>>>>>>>>> of
>>>>>>>>>>>>> new alloca should be determined.
>>>>>>>>>>>>> If observedness is involved, we need to know the future 
>>>>>>>>>>>>> state again. :/
>>>>>>>>>>>> We
>>>>>>>>>>>>> don't know whether the alloca is going to be observed or 
>>>>>>>>>>>>> not without
>>>>>>>>>>>> seeing
>>>>>>>>>>>>> the future.
>>>>>>>>>>>>> This is the problem of the current lifetime intrinsics as 
>>>>>>>>>>>>> well.
>>>>>>>>>>>> No, you mix things up here.
>>>>>>>>>>>>
>>>>>>>>>>>> Nobody proposed to modify the semantics of `alloca`.
>>>>>>>>>>>> `alloca` provides you with a fresh, unobserved block of
>>>>>>>>>>>> dereferenceable memory that is implicitly freed as the stack
>>>>>>>>>>>> unwinds. That is it. No context necessary.
>>>>>>>>>>>>
>>>>>>>>>>>> If you want to modify the IR, you need to argue the observable
>>>>>>>>>>>> semantics which is nothing new. That this might require 
>>>>>>>>>>>> more than
>>>>>>>>>>>> a peephole view of the program is also not new.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> One possible approach to resolve this is adding an 
>>>>>>>>>>>>> 'unobserved' flag to
>>>>>>>>>>>>> alloca instruction (similar to what was suggested by 
>>>>>>>>>>>>> Nicolai).
>>>>>>>>>>>>> And, we can say that if alloca with 'unobserved' is used by
>>>>>>>>>>>> ptrtoint/icmp,
>>>>>>>>>>>>> it is UB.
>>>>>>>>>>>> The flag can be added, like we add other attributes. It 
>>>>>>>>>>>> should not
>>>>>>>>>>>> be required for any optimization we talked about though. It 
>>>>>>>>>>>> basically
>>>>>>>>>>>> is a way to manifest derived or given information into the IR.
>>>>>>>>>>>>
>>>>>>>>>>>> Attribute deduction, as well as frontends with domain 
>>>>>>>>>>>> knowledge,
>>>>>>>>>>>> can add such information. The flag we discussed in phab was 
>>>>>>>>>>>> not even
>>>>>>>>>>>> sufficient for all the transformation examples I presented 
>>>>>>>>>>>> in my mail,
>>>>>>>>>>>> that is why I extended my  argument. We could still have a 
>>>>>>>>>>>> "noescape"
>>>>>>>>>>>> flag for allocas, but I'm not sure how useful that really 
>>>>>>>>>>>> is. We can
>>>>>>>>>>>> certainly deduce it and manifest it, unsure if we have 
>>>>>>>>>>>> domain knowledge
>>>>>>>>>>>> we can use for non-trivial cases though.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> But this makes ptrtoint/icmp make UB-raising instructions, 
>>>>>>>>>>>>> which
>>>>>>>>>>>>> contradicts with what LLVM does.
>>>>>>>>>>>> As with other violation of attributes I would, on first 
>>>>>>>>>>>> though, suggest
>>>>>>>>>>>> to produce poison, not UB.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Also, existing optimizations like loop versioning can 
>>>>>>>>>>>>> introduce
>>>>>>>>>>>>> ptrtoint/pointer comparisons too.
>>>>>>>>>>>> Sure. I am not certain why that is a problem. I get the 
>>>>>>>>>>>> feeling things
>>>>>>>>>>>> are still mixed up here.
>>>>>>>>>>>>
>>>>>>>>>>>> What I proposed is twofold:
>>>>>>>>>>>>
>>>>>>>>>>>> 1) We stop folding comparisons between different allocas if 
>>>>>>>>>>>> changing the
>>>>>>>>>>>> address
>>>>>>>>>>>>       of both might be observable. Thus, if both might have 
>>>>>>>>>>>> their address
>>>>>>>>>>>> "taken"/escaped,
>>>>>>>>>>>>       other than the comparisons we want to fold, we cannot 
>>>>>>>>>>>> proceed.
>>>>>>>>>>>>
>>>>>>>>>>>> 2) We define lifetime markers to mean `memset(undef)`.
>>>>>>>>>>>>
>>>>>>>>>>>> The first should be sufficient for the problem you were 
>>>>>>>>>>>> trying to solve
>>>>>>>>>>>> in the
>>>>>>>>>>>> first place. The second makes lifetime markers less weird. 
>>>>>>>>>>>> Note that 1)
>>>>>>>>>>>> is not changing
>>>>>>>>>>>> the semantics of the IR. We basically just argue there is a 
>>>>>>>>>>>> bug in our
>>>>>>>>>>>> instcombine right
>>>>>>>>>>>> now as we do not check all necessary preconditions.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> I see that there are other questions that I didn't answer 
>>>>>>>>>>>>> yet, but let
>>>>>>>>>> me
>>>>>>>>>>>>> answer this first to limit the length of the text :)
>>>>>>>>>>>> Sure, we can split the discussion :)
>>>>>>>>>>>>
>>>>>>>>>>>> ~ Johannes
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Juneyoung
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Jan 7, 2021 at 3:36 AM Johannes Doerfert <
>>>>>>>>>>>> johannesdoerfert at gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 1/5/21 8:00 PM, Juneyoung Lee wrote:
>>>>>>>>>>>>>>> Hi Johannes,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I read your proposal and thought about the model.
>>>>>>>>>>>>>> Cool, thanks!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> As you concerned in A3, certain programs may be valid 
>>>>>>>>>>>>>>> only when
>>>>>>>>>> memory
>>>>>>>>>>>>>>> blocks with overlapping lifetimes have disjoint addresses.
>>>>>>>>>>>>>>> Look at this example (I'm using malloc, but alloca also 
>>>>>>>>>>>>>>> works):
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> p1 = malloc(4)
>>>>>>>>>>>>>>> p2 = malloc(4) // for brevity, assume that there as 
>>>>>>>>>>>>>>> enough space & p1
>>>>>>>>>>>> and
>>>>>>>>>>>>>>> p2 != null
>>>>>>>>>>>>>>> set<char*> s;
>>>>>>>>>>>>>>> s.insert(p1); s.insert(p2); // If the second insert did 
>>>>>>>>>>>>>>> nothing, it
>>>>>>>>>>>> would
>>>>>>>>>>>>>>> be surprise to programmers
>>>>>>>>>>>>>>> work(s);
>>>>>>>>>>>>>>> free(data1)
>>>>>>>>>>>>>>> free(data2)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Clearly, IR semantics should guarantee that escaped 
>>>>>>>>>>>>>>> blocks are
>>>>>>>>>>>> disjoint.
>>>>>>>>>>>>>> It
>>>>>>>>>>>>>>> would be great for verification tools on LLVM IR to be 
>>>>>>>>>>>>>>> able to answer
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>> the second insert will succeed.
>>>>>>>>>>>>>> I agree, the second insert should succeed, assuming `p1 
>>>>>>>>>>>>>> && p2`.
>>>>>>>>>>>>>> I don't think my proposal would in any way impact the 
>>>>>>>>>>>>>> program above,
>>>>>>>>>>>>>> if anything the A3 reasoning makes sure such a program 
>>>>>>>>>>>>>> with allocas
>>>>>>>>>>>>>> is not miscompiled.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'm also not sure I understand what you try to argue for. 
>>>>>>>>>>>>>> Maybe
>>>>>>>>>>>>>> elaborate a bit what it is you think is bad or needs to 
>>>>>>>>>>>>>> be changed.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The problem is that definition of escapedness is not 
>>>>>>>>>>>>>>> clear at the
>>>>>>>>>>>>>> semantic
>>>>>>>>>>>>>>> level. Describing the IR semantics w.r.t. LLVM's escaped 
>>>>>>>>>>>>>>> analysis
>>>>>>>>>>>>>> function
>>>>>>>>>>>>>>> isn't something we want.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Semantic definition of escapedness of a pointer seems 
>>>>>>>>>>>>>>> hard, I mean
>>>>>>>>>> in a
>>>>>>>>>>>>>>> stepwise manner.
>>>>>>>>>>>>>>> It isn't a single instruction such as 'escape i8* ptr', 
>>>>>>>>>>>>>>> and we need
>>>>>>>>>> to
>>>>>>>>>>>>>> look
>>>>>>>>>>>>>>> over all instructions in the function. For example, 
>>>>>>>>>>>>>>> '(int)(p+1) -
>>>>>>>>>>>> (int)p'
>>>>>>>>>>>>>>> isn't semantically escaping the pointer p because the 
>>>>>>>>>>>>>>> result is 1
>>>>>>>>>>>>>>> regardless of the value of p.
>>>>>>>>>>>>>>> Stepwisely defining the semantics of instructions is a 
>>>>>>>>>>>>>>> desirable
>>>>>>>>>>>>>> direction
>>>>>>>>>>>>>>> IMO.
>>>>>>>>>>>>>> I'm confused. What in the proposal would prevent us from 
>>>>>>>>>>>>>> defining
>>>>>>>>>>>>>> the semantics of instructions, or force us to do it in an 
>>>>>>>>>>>>>> "undesirable
>>>>>>>>>>>>>> way"?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In practice, syntactically checking escapedness + nice 
>>>>>>>>>>>>>>> engineering
>>>>>>>>>>>> might
>>>>>>>>>>>>>>> not break optimizations in most cases (as undef/poison 
>>>>>>>>>>>>>>> did); but it
>>>>>>>>>>>> would
>>>>>>>>>>>>>>> be great to move to another level, since LLVM IR is used 
>>>>>>>>>>>>>>> in so many
>>>>>>>>>>>>>> places
>>>>>>>>>>>>>>> :)
>>>>>>>>>>>>>> The property under which you can coalesce objects is simple:
>>>>>>>>>>>>>>       It is not observable.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Now, if the address of one of the two objects you 
>>>>>>>>>>>>>> coalesce is not
>>>>>>>>>>>>>> observed, coalescing is not observable. That is a 
>>>>>>>>>>>>>> sufficient condition
>>>>>>>>>>>>>> not a necessary one. Pointer "escaping" is one step 
>>>>>>>>>>>>>> further. If the
>>>>>>>>>>>>>> address doesn't escape it is not observed. This does not 
>>>>>>>>>>>>>> mean the
>>>>>>>>>>>>>> "semantic conditions for coalescing", e.g., for the 
>>>>>>>>>>>>>> purpose of
>>>>>>>>>>>> translation
>>>>>>>>>>>>>> validation, is supposed to be build on top of our 
>>>>>>>>>>>>>> "definition of
>>>>>>>>>>>> escaping
>>>>>>>>>>>>>> pointers". That said, we use "does not escape" as a 
>>>>>>>>>>>>>> precondition for
>>>>>>>>>>>>>> various transformation and I'm unsure what is any 
>>>>>>>>>>>>>> different now. The
>>>>>>>>>>>>>> entire escape argument is only used in the validity of 
>>>>>>>>>>>>>> the pointer
>>>>>>>>>>>> folding.
>>>>>>>>>>>>>> Similarly, we can fold a comparison of a noalias pointer 
>>>>>>>>>>>>>> with another
>>>>>>>>>>>> one
>>>>>>>>>>>>>> if the former does not escape (and both are dereferenced 
>>>>>>>>>>>>>> and one is
>>>>>>>>>>>>>> written for sure).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The pointer comparison is another beast. It indeed has a 
>>>>>>>>>>>>>>> few issues,
>>>>>>>>>>>> and
>>>>>>>>>>>>>>> solving it might require nontrivial solution.
>>>>>>>>>>>>>> I think the main problem of the inconsistencies and such 
>>>>>>>>>>>>>> we've seen is
>>>>>>>>>>>>>> rooted in the erroneous folding of pointer comparisons. 
>>>>>>>>>>>>>> Cleaning up
>>>>>>>>>> the
>>>>>>>>>>>>>> lifetime marker semantics is actually unrelated and 
>>>>>>>>>>>>>> simply not folding
>>>>>>>>>>>>>> as described in A3 should solve the issue that has been 
>>>>>>>>>>>>>> reported.
>>>>>>>>>>>>>> Nevertheless,
>>>>>>>>>>>>>> we should take a crack at lifetime markers while we are 
>>>>>>>>>>>>>> here.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ~ Johannes
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Juneyoung
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Jan 5, 2021 at 9:37 AM Johannes Doerfert <
>>>>>>>>>>>>>> johannesdoerfert at gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Juneyoung,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Happy new year :)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> After we had a lengthy discussion on phab last year, 
>>>>>>>>>>>>>>>> I've tried to
>>>>>>>>>>>>>>>> summarize my thoughts,
>>>>>>>>>>>>>>>> especially given that I had some time to think about 
>>>>>>>>>>>>>>>> things over the
>>>>>>>>>>>>>> break.
>>>>>>>>>>>>>>>> Still, no promises on the quality ;)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I start with general questions I asked myself and then 
>>>>>>>>>>>>>>>> go on
>>>>>>>>>> rambling
>>>>>>>>>>>>>>>> about a potential design.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Q1: Is lifetime a given property or a derived one, thus 
>>>>>>>>>>>>>>>> is it fixed
>>>>>>>>>> or
>>>>>>>>>>>>>>>> modifiable?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This is a question I asked myself a lot recently. I 
>>>>>>>>>>>>>>>> think it is
>>>>>>>>>>>> derived
>>>>>>>>>>>>>>>> and modifiable,
>>>>>>>>>>>>>>>> at least I hope it is. Only that would allow 
>>>>>>>>>>>>>>>> transformations I would
>>>>>>>>>>>>>> want
>>>>>>>>>>>>>>>> us to do. Here are some examples:
>>>>>>>>>>>>>>>>        https://godbolt.org/z/G8obj3
>>>>>>>>>>>>>>>>        https://godbolt.org/z/obaTc
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Q2: Is a pointer comparison, or similar use, extending 
>>>>>>>>>>>>>>>> the lifetime?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Asked differently, can we hoist a pointer comparison 
>>>>>>>>>>>>>>>> into a region
>>>>>>>>>>>> where
>>>>>>>>>>>>>>>> the pointer is dead?
>>>>>>>>>>>>>>>> This is an important question which we haven't 
>>>>>>>>>>>>>>>> discussed much as we
>>>>>>>>>>>>>>>> assumed LICM has to work.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The current behavior is that non-dereferencing uses are 
>>>>>>>>>>>>>>>> not
>>>>>>>>>> extending
>>>>>>>>>>>>>>>> the lifetime and are
>>>>>>>>>>>>>>>> allowed outside of "lifetime regions" (as indicated by 
>>>>>>>>>>>>>>>> markers).
>>>>>>>>>> They
>>>>>>>>>>>>>>>> will always produce valid
>>>>>>>>>>>>>>>> results. Though, we might want to think about a 
>>>>>>>>>>>>>>>> lifetime marker that
>>>>>>>>>>>>>>>> spits out a new pointer
>>>>>>>>>>>>>>>> value instead of reusing the old one.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Q3: Can we use lifetime to argue about addresses?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The question here is, can we fold address comparisons 
>>>>>>>>>>>>>>>> based on
>>>>>>>>>>>>>>>> lifetimes, or not.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> So far, we fold comparisons based on "address 
>>>>>>>>>>>>>>>> information". For
>>>>>>>>>>>> example,
>>>>>>>>>>>>>>>> we "know" globals,
>>>>>>>>>>>>>>>> allocas, and mallocs cannot be equal to one another. 
>>>>>>>>>>>>>>>> Also, two
>>>>>>>>>>>> distinct
>>>>>>>>>>>>>>>> allocations, for globals
>>>>>>>>>>>>>>>> and allocas, are considered unequal. Now, the crux is 
>>>>>>>>>>>>>>>> that we have
>>>>>>>>>> to
>>>>>>>>>>>> be
>>>>>>>>>>>>>>>> consistent if we do two
>>>>>>>>>>>>>>>> comparisons, and, as of now, we are not (bug number 
>>>>>>>>>>>>>>>> missing). Since
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> backend (or any other place
>>>>>>>>>>>>>>>> for that matter) might coalesce allocas, their 
>>>>>>>>>>>>>>>> addresses might not
>>>>>>>>>> be
>>>>>>>>>>>>>>>> different after all. If we
>>>>>>>>>>>>>>>> already folded a comparison to "unequal" we are doomed 
>>>>>>>>>>>>>>>> if we later
>>>>>>>>>>>> have
>>>>>>>>>>>>>>>> a comparison that results
>>>>>>>>>>>>>>>> in "equal". (Note, this is different from aliasing 
>>>>>>>>>>>>>>>> rules as they can
>>>>>>>>>>>> be
>>>>>>>>>>>>>>>> stricter.)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Design:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I would hope we can come up with a model that treats 
>>>>>>>>>>>>>>>> memory "the
>>>>>>>>>>>> same",
>>>>>>>>>>>>>>>> regardless if it is global,
>>>>>>>>>>>>>>>> stack, or heap. I want to avoid special casing one of 
>>>>>>>>>>>>>>>> them wrt.
>>>>>>>>>>>> lifetime
>>>>>>>>>>>>>>>> as I believe most optimizations
>>>>>>>>>>>>>>>> would apply to any of them, potentially for different 
>>>>>>>>>>>>>>>> reasons and
>>>>>>>>>> with
>>>>>>>>>>>>>>>> different gains, but nevertheless.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Proposal (largely based on the conversation in phab):
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> A1: Lifetime is a concept that talks about memory 
>>>>>>>>>>>>>>>> content *only*.
>>>>>>>>>>>>>>>> Basically, the memory content is set to
>>>>>>>>>>>>>>>>          undefined by lifetime markers. It is 
>>>>>>>>>>>>>>>> derived/modifiable as
>>>>>>>>>> it
>>>>>>>>>>>> is
>>>>>>>>>>>>>>>> just an "as-is" property of the memory
>>>>>>>>>>>>>>>>          content. The lifetimes of an object, as 
>>>>>>>>>>>>>>>> described by
>>>>>>>>>> markers,
>>>>>>>>>>>>>> might
>>>>>>>>>>>>>>>> change during the compilation. They
>>>>>>>>>>>>>>>>          might become smaller if we deduce the object 
>>>>>>>>>>>>>>>> is not accessed
>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> the memory content is not used, they
>>>>>>>>>>>>>>>>          might become larger if objects with 
>>>>>>>>>>>>>>>> non-overlapping
>>>>>>>>>> lifetimes
>>>>>>>>>>>> are
>>>>>>>>>>>>>>>> coalesced. (One could see the latter as
>>>>>>>>>>>>>>>>          introducing a new object though.)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> A2: If we define lifetime as above, it has nothing to 
>>>>>>>>>>>>>>>> do with the
>>>>>>>>>>>>>>>> address of an object. Consequently, pointer
>>>>>>>>>>>>>>>>          comparisons and similar operations are valid 
>>>>>>>>>>>>>>>> outside the
>>>>>>>>>>>> lifetime.
>>>>>>>>>>>>>>>> Loads and stores are as well, they can
>>>>>>>>>>>>>>>>          even not be removed "easily". A store followed 
>>>>>>>>>>>>>>>> by a lifetime
>>>>>>>>>>>>>> marker
>>>>>>>>>>>>>>>> or a load following a lifetime marker
>>>>>>>>>>>>>>>>          is dead or results in undef respectively.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> A3: We could not use lifetime to argue about addresses. 
>>>>>>>>>>>>>>>> This means
>>>>>>>>>> we
>>>>>>>>>>>>>>>> could/should also not argue that overlapping
>>>>>>>>>>>>>>>>          lifetimes result in different addresses. Thus, 
>>>>>>>>>>>>>>>> a comparison
>>>>>>>>>>>>>> between
>>>>>>>>>>>>>>>> the address of two allocas could not
>>>>>>>>>>>>>>>>          immediately be folded. That said, there would 
>>>>>>>>>>>>>>>> be special
>>>>>>>>>> cases
>>>>>>>>>>>>>>>> though. Basically, if one of the allocas does
>>>>>>>>>>>>>>>>          not escape, other than the comparisons to be 
>>>>>>>>>>>>>>>> folded, we can
>>>>>>>>>>>> fold
>>>>>>>>>>>>>>>> them. Afterwards, coalescing or splitting
>>>>>>>>>>>>>>>>          would still be consistent because it is 
>>>>>>>>>>>>>>>> unobservable. The
>>>>>>>>>>>> problem
>>>>>>>>>>>>>>>> we have in-tree is that we fold even though
>>>>>>>>>>>>>>>>          the address is still observed (after the 
>>>>>>>>>>>>>>>> fold). It is still
>>>>>>>>>>>>>> unclear
>>>>>>>>>>>>>>>> to me what the impact of this would be
>>>>>>>>>>>>>>>>          on real code. I suggested before that we run some
>>>>>>>>>> experiments
>>>>>>>>>>>>>> first
>>>>>>>>>>>>>>>> before we make any decision whatsoever.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This is pretty much saying that lifetime markers are
>>>>>>>>>> `memset(undef)`,
>>>>>>>>>>>> as
>>>>>>>>>>>>>>>> you suggested before (I think).
>>>>>>>>>>>>>>>> There are some implementation level differences but at 
>>>>>>>>>>>>>>>> the end of
>>>>>>>>>> the
>>>>>>>>>>>>>>>> day they are basically the same.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Happy to hear some thoughts on this, especially if it 
>>>>>>>>>>>>>>>> fixes the
>>>>>>>>>>>> problems
>>>>>>>>>>>>>>>> that lead to D93376 in the first place.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ~ Johannes
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 12/18/20 2:42 AM, Juneyoung Lee via llvm-dev wrote:
>>>>>>>>>>>>>>>>> Hello all,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> We're discussing the well-formedness of 
>>>>>>>>>>>>>>>>> lifetime.start/end
>>>>>>>>>> intrinsic
>>>>>>>>>>>>>>>> here (
>>>>>>>>>>>>>>>>> https://reviews.llvm.org/D93376), deciding what is a
>>>>>>>>>> (syntactically
>>>>>>>>>>>> &
>>>>>>>>>>>>>>>>> semantically) valid use of these intrinsics.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I'd like to gather more context about the intrinsics.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 1. Is there a frontend or framework that introduces 
>>>>>>>>>>>>>>>>> lifetime call
>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>> non-stack allocated objects?
>>>>>>>>>>>>>>>>> If exists, which behavior do they expect from it?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2. Can a block's lifetime be started bytewise or 
>>>>>>>>>>>>>>>>> elementwise?
>>>>>>>>>>>>>>>>> I imagine an optimization that allow using stack very 
>>>>>>>>>>>>>>>>> compactly,
>>>>>>>>>> but
>>>>>>>>>>>>>>>> wonder
>>>>>>>>>>>>>>>>> there is a real-world use case.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Juneyoung
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>> LLVM Developers mailing list
>>>>>>>>>>>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>>>>>>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>>
>>>>>>>
>>>>>
>>>
>