<div dir="ltr"><div dir="ltr"><div>Hi,</div><div dir="ltr"><br></div><div dir="ltr">About syntactic constraints - I think it can be helpful for transformations that need to be fully aware of all allocations' lifetimes.<div>For example, AddressSanitizer is currently simply giving up instrumenting things if there is an unknown lifetime usage.</div><div>If all lifetime intrinsics' pointers are guaranteed to be known expressions, this issue is resolved.</div><div><br></div><div><div style="color:rgb(0,0,0);font-family:Menlo,Monaco,"Courier New",monospace;font-size:12px;line-height:18px;white-space:pre-wrap"><div>    <span style="color:rgb(175,0,219)">if</span> (<span style="color:rgb(0,16,128)">HasUntracedLifetimeIntrinsic</span>) {</div><div><span style="color:rgb(0,128,0)">      // If there are lifetime intrinsics which couldn't be traced back to an</span></div><div><span style="color:rgb(0,128,0)">      // alloca, we may not know exactly when a variable enters scope, and</span></div><div><span style="color:rgb(0,128,0)">      // therefore should "fail safe" by not poisoning them.</span></div><div>      <span style="color:rgb(0,16,128)">StaticAllocaPoisonCallVec</span>.<span style="color:rgb(121,94,38)">clear</span>();</div><div>      <span style="color:rgb(0,16,128)">DynamicAllocaPoisonCallVec</span>.<span style="color:rgb(121,94,38)">clear</span>();</div><div>    }</div><br></div></div><div>Interestingly, AddressSanitizer is the one that raises an issue as well if the syntactic restriction is enforced.</div><div>It replaces allocas with __asan_stack_malloc_N, but still leaves its lifetime uses, which breaks the syntactic restriction. A possible solution for this is to simply remove the lifetime calls.</div><div><br></div><div>Juneyoung</div></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Feb 1, 2021 at 2:36 PM Johannes Doerfert <<a href="mailto:johannesdoerfert@gmail.com" target="_blank">johannesdoerfert@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><br>

On 1/30/21 7:40 AM, Ralf Jung wrote:<br>

> Hi Johannes,<br>

><br>

>>>> I was opposing the proposal because of the brittle syntactic <br>

>>>> restrictions. In my attempt to justify not having those I also <br>

>>>> tried to resolve the above inconsistency in a more general manner <br>

>>>> because I thought that was what it was all about. I believe that we <br>

>>>> either have to argue the address of a dead object is <br>

>>>> unusable/undetermined, basically the direction of James comment, or <br>

>>>> we need to decouple lifetime markers from locations.<br>

>>><br>

>>> Folding to false and coalescing are inconsistent with each other, <br>

>>> yes. It is just wrong to perform both of these on the above program, <br>

>>> I would say (if we interpret it with LLVM semantics, which I assume <br>

>>> has no "pointer zapping"). Is that up for debate?<br>

>>><br>

>>> However, I do not see the immediate connection to lifetime markers <br>

>>> -- the same would be true for a program using malloc/free. <br>

>>> Coalescing is permissible whenever lifetimes are disjoint. Folding <br>

>>> to false is only permissible if the pointers are *definitely not <br>

>>> equal*, e.g. if their lifetimes overlap.<br>

>>><br>

>> I still believe folding can be done regardless of their lifetime, as <br>

>> long as it is done consistent. Fold all pointer observations and the <br>

>> actual<br>

>> placement is not observable, we went down that road.<br>

><br>

> Sure, but that's the kind of exception that always applies when <br>

> discussing optimizations.<br>

> But to make my statement more precise: you cannot fold comparison of <br>

> *escaped* pointers to false unless the pointers are *definitely not <br>

> equal*, e.g. if their lifetimes overlap.<br>

><br>

The interesting part is, lifetimes change. I continue to argue you <br>

cannot fold if both pointers have escaped. If you do, and other things <br>

happen that fiddle with the lifetime, you might accidentally coalesce <br>

them (explicitly or implicitly).<br>

<br>

<br>

>> I agree with the malloc/free comparison, I did argue that the entire <br>

>> time, or I tried.<br>

>><br>

>> If there is no/little connection to lifetime markers, I assume <br>

>> <a href="https://reviews.llvm.org/D93376" rel="noreferrer" target="_blank">https://reviews.llvm.org/D93376</a> did not fix the inconsistency then. <br>

>> It was,<br>

>> apparently, just an approach to clarify what a lifetime marker is, <br>

>> right? I might be more confused than before. I believe you should equate<br>

>> lifetime markers and memset(undef) and put the entire discussion <br>

>> about folding/coalescing onto a foundation that is unrelated to lifetime<br>

>> markers.<br>

><br>

> My understanding is that lifetime markers are *explicitly intended to <br>

> support coalescing* by permitting later phases to put allocas with <br>

> disjoint lifetimes into the same stack slot. D93376 intended to keep <br>

> it that way; fundamentally changing what lifetime markers are about by <br>

> making them only about the content and not the address of the alloca <br>

> seemed like a much more invasive change.<br>

> Am I misunderstanding something?<br>

<br>

I do think they were designed for that, yes. And I do agree my change <br>

looks more invasive from that perspective.<br>

<br>

Now does D93376 help with the problems we have is my question. Or, asked <br>

differently, what is a way out of this which improves the big picture, <br>

and what is practical. For the first, limiting it to allocas seems to be <br>

the wrong way. So far, the argument seems to be that "we only use it for <br>

them", but I'm not sure why a use for other pointers would be somehow <br>

bad (<a href="https://godbolt.org/z/G8obj3" rel="noreferrer" target="_blank">https://godbolt.org/z/G8obj3</a>). Second, limiting it to syntactic <br>

constructs is the opposite of practical. It has been said before, <br>

syntactic constraints are brittle. Beyond that, the syntactic def-use <br>

chain requirement has not even come up in any of this mails, at least as <br>

far as I understand it. It still is unclear to me how that would help.<br>

<br>

That all said, let's see what we need:<br>

1) A way to ensure we don't create inconsistent situations wrt. <br>

allocation addresses where there were none.<br>

2) A way to ensure we can reasonably coalesce allocas to minimize stack <br>

usage.<br>

<br>

For one, I feel we converged pretty much on the fact that pointer <br>

comparison folding is a problem. In the example below we have lifetime <br>

markers and we use them to argue about lifetimes of the allocations, but <br>

we still agree the folding should not happen. Once the folding is <br>

restricted to pointer pairs for which at least one doesn't have it's <br>

address observed, I would like to revisit 1). So far, I believe it would <br>

be resolved. For 2), we could stick with what we have now, or adapt <br>

something new afterwards. I guess the existing wording needs to be <br>

cleared up either way.<br>

<br>

Long story short, I don't see how D93376 addresses what seems to be the <br>

problem.<br>

<br>

<br>

<br>

><br>

>>> The syntactic condition also provides a clear path to evaluate <br>

>>> whether any given folding or coalescing is correct. <br>

>><br>

>> I disagree. The syntactic conditions, at least as I read them, allow <br>

>> the above program with lifetime markers (as we have it now) but did not<br>

>> suggest it would be wrong to fold and later coalesce. Or maybe I <br>

>> missed why the syntactic restrictions would prevent that, please <br>

>> explain then.<br>

><br>

> Where in that program would you place the lifetime markers? I assume <br>

> at the beginning and end of the local variable's scope?<br>

> So that would become something like (also reflecting that allocas are <br>

> all at the top):<br>

><br>

> void foo() {<br>

>    intptr_t xp, yp;<br>

>    int x, y;<br>

>    { lifetime.start(&x); xp = &x; bar(&x); lifetime.end(&x); }<br>

>    { lifetime.start(&y); yp = &y; bar(&y); lifetime.end(&y); }<br>

>    print(xp == yp);               // can we fold this comparison to <br>

> false?<br>

> }<br>

><br>

Yes, this is how clang would generate the code right now.<br>

<br>

<br>

> I think evaluating correctness of folding to false here is quite clear <br>

> (I assume this is just using C syntax to represent LLVM IR, so we <br>

> applyLLVM IR semantics):<br>

<br>

Agreed.<br>

<br>

<br>

> - The program as-written is non-deterministic; x and y have disjoint <br>

> lifetime so they could have the same or a different address.<br>

<br>

Agreed (for the C version for which the above is the LLVM-IR <br>

implementation).<br>

<br>

<br>

> - When the compiler decides to fold the comparison to false, it means <br>

> it has to commit to one of the possible non-det choices, and needs to <br>

> stick to that choice. <br>

<br>

Agreed.<br>

<br>

<br>

> So *just* replacing "xp == yp" by "false" is incorrect in the sense <br>

> that the new program has a possible behavior that did not occur with <br>

> the old program: "true false" can now be printed (*if* the allocator <br>

> chooses to put x and y into the same location), whereas the original <br>

> program would never print "true false". <br>

<br>

Agreed.<br>

<br>

<br>

> Note that we do not need to talk about coalescing here, we are just <br>

> exploiting the fact that allocation is non-deterministic and that <br>

> allocations with non-overlapping lifetime can be in the same spot or <br>

> in different spots.<br>

<br>

Sure. When I refer to coalescing I was also including "implicit <br>

coalescing", e.g., as happened in my outlining example.<br>

<br>

<br>

><br>

> IOW, coalescing "x" and "y" in this program is correct, but folding <br>

> the comparison to false is incorrect.<br>

<br>

Agreed, assuming we don't do anything to bar.<br>

<br>

<br>

><br>

> So when you disagree with my statement that we have "a clear path to <br>

> evaluate whether any given folding or coalescing is correct", then <br>

> which of the above to you disagree with?<br>

<br>

None. I disagreed with the part "The syntactic condition also provides a <br>

clear path ...", which implies this path was dependent on the syntactic <br>

condition. I don't see you making an argument about syntactic def-use <br>

chains above.<br>

<br>

I am basically struggling to understand why we need to introduce <br>

syntactic restrictions on lifetime markers and why they need to be <br>

limited to allocas. Those are two concepts included in D93376, right?<br>

<br>

<br>

><br>

> Folding to false would be correct if it would be accompanied by also <br>

> embedding into the program some hint which tells the allocator that <br>

> the two allocations must not be at the same spot in memory. This would <br>

> be a way to ensure that LLVM "sticks to its choice". But that issue is <br>

> entirely orthogonal to the semantics of lifetime.start / lifetime.end; <br>

> it arises with allocation lifetimes of any kind, including malloc/free.<br>

><br>

> I think you are proposing to make them only about the contents of the <br>

> alloca, so in the above "foo", now the lifetime of (the allocations <br>

> backing) "x" and "y" *would* overlap, so folding to false (of escaped <br>

> pointers) is... still not allowed, but now for less subtle reasons. <br>

> But this doesn't *solve* the "fold to false" problem, it just means <br>

> that the problem no longer applies to this particular program.<br>

<br>

My original email in this thread addressed the folding problem but I <br>

clarified since: Don't fold if both addresses are observed, regardless <br>

of any lifetime argument. You seem to mix the comment I made on phab <br>

last year with the email discussion. The phab conversation is outdated <br>

and replaced by this email thread.<br>

<br>

<br>

><br>

> Or is your argument that, since "fold to false" is wrong in both <br>

> cases, it doesn't really matter which choice we make?  However, in <br>

> your proposal, the option of doing the fold while also "adding hints <br>

> for the allocator" no longer exists.<br>

<br>

My proposal comes in two parts: Fix the folding issue, then revisit <br>

markers. I'm not convinced that the folding issue can be fixed by "hints <br>

for the allocator" but I believe the escape argument holds.<br>

<br>

<br>

<br>

><br>

>> I don't think anyone wanted that. TBH, I think my proposal is by far <br>

>> the most compatible one wrt. the "classic" model w/o lifetime <br>

>> markers, given<br>

>> that I proposed not to entangle them in object allocation. Everything <br>

>> else I proposed are "as-if" transformations. Do you disagree?<br>

><br>

> I agree that making the markers just erase the contents of the alloca <br>

> is the simplest proposal on the table. *If* we truly can make LLVM <br>

> follow that model, that would be very nice.<br>

> I would just be very surprised if it was possible to get this proposal <br>

> past the people that will carefully watch over the existing <br>

> optimization potential LLVM has, and make sure it does not become <br>

> smaller. ;)  Maybe I am too pessimistic, but I would have never dared <br>

> even making such a proposal as I would expect it to be shot down. I'll <br>

> be happy to be proven wrong about this. :)<br>

<br>

I've made a few actually, `mustprogress` was the latest and you were <br>

involved ;). It does prevent certain optimizations we did before <br>

unconditionally but it also fixes a conceptual problem. At least the <br>

pointer inconsistency is a conceptual problem that needs fixing. What we <br>

do with lifetime markers is probably orthogonal.<br>

<br>

~ Johannes<br>

<br>

<br>

><br>

> Kind regards,<br>

> Ralf<br>

><br>

>><br>

>> ~ Johannes<br>

>><br>

>><br>

>><br>

>>><br>

>>> Kind regards,<br>

>>> Ralf<br>

>>><br>

>>>><br>

>>>> ~ Johannes<br>

>>>><br>

>>>><br>

>>>>><br>

>>>>> Kind regards,<br>

>>>>> Ralf<br>

>>>>><br>

>>>>>> A lot of my arguments around alloca/malloc/... is that we can <br>

>>>>>> augment the information from the frontend by determining if an <br>

>>>>>> address is observed or not. If it is not, we can limit its <br>

>>>>>> allocation, coalesce it, etc. Lifetime markers can help us do <br>

>>>>>> this but they are not the only way.<br>

>>>>>> So, basically, we do not need lifetime markers to coalesce <br>

>>>>>> alloca/malloc/... but that they can help.<br>

>>>>><br>

>>>>><br>

>>>>><br>

>>>>>><br>

>>>>>> Let's try to get out of this rabbit hole. If you have a problem <br>

>>>>>> with the proposal as outlined in <br>

>>>>>> <a href="https://lists.llvm.org/pipermail/llvm-dev/2021-January/147594.html" rel="noreferrer" target="_blank">https://lists.llvm.org/pipermail/llvm-dev/2021-January/147594.html</a> <br>

>>>>>> , please let me know. I assume dealing with observed addresses <br>

>>>>>> different than with non-observede ones (implied by noescape) is <br>

>>>>>> the way forward now.<br>

>>>>>><br>

>>>>>> ~ Johannes<br>

>>>>>><br>

>>>>>><br>

>>>>>><br>

>>>>>>><br>

>>>>>>> Kind regards,<br>

>>>>>>> Ralf<br>

>>>>>>><br>

>>>>>>><br>

>>>>>>>><br>

>>>>>>>><br>

>>>>>>>>><br>

>>>>>>>>> Kind regards,<br>

>>>>>>>>> Ralf<br>

>>>>>>>>><br>

>>>>>>>>>><br>

>>>>>>>>>> Does that make some sense?<br>

>>>>>>>>>><br>

>>>>>>>>>> ~ Johannes<br>

>>>>>>>>>><br>

>>>>>>>>>><br>

>>>>>>>>>>><br>

>>>>>>>>>>> Kind regards,<br>

>>>>>>>>>>> Ralf<br>

>>>>>>>>>>><br>

>>>>>>>>>>>><br>

>>>>>>>>>>>><br>

>>>>>>>>>>>> On 1/7/21 10:12 PM, Juneyoung Lee wrote:<br>

>>>>>>>>>>>>> On Fri, Jan 8, 2021 at 8:54 AM Johannes Doerfert <br>

>>>>>>>>>>>>> <<a href="mailto:johannesdoerfert@gmail.com" target="_blank">johannesdoerfert@gmail.com</a>><br>

>>>>>>>>>>>>> wrote:<br>

>>>>>>>>>>>>><br>

>>>>>>>>>>>>>> On 1/7/21 4:55 PM, Juneyoung Lee wrote:<br>

>>>>>>>>>>>>>>> It is, alloca must have its integral address decided at <br>

>>>>>>>>>>>>>>> its *allocation*<br>

>>>>>>>>>>>>>>> time as well, like 0xFF00... .<br>

>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>> For example, if 0xFFFF0000 ~ 0xFF000000 is already <br>

>>>>>>>>>>>>>>> allocated, the new<br>

>>>>>>>>>>>>>>> alloca cannot be placed there unless it is not observed <br>

>>>>>>>>>>>>>>> in the future,<br>

>>>>>>>>>>>>>>> according to your proposal.<br>

>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>> But how do we know from the current state whether the <br>

>>>>>>>>>>>>>>> stack variable is<br>

>>>>>>>>>>>>>>> going to be observed or not?<br>

>>>>>>>>>>>>>> With that argument you could allocate all but 4 bytes, do <br>

>>>>>>>>>>>>>> a malloc(4)<br>

>>>>>>>>>>>>>> and know the address of the returned pointer (assuming it <br>

>>>>>>>>>>>>>> is not null).<br>

>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>> What I try to say is, either your scenario is part of the <br>

>>>>>>>>>>>>>> model and<br>

>>>>>>>>>>>>>> everything we do is broken as you could "observe" <br>

>>>>>>>>>>>>>> addresses passively,<br>

>>>>>>>>>>>>>> *or*, the abstract machine we use for semantic reasoning <br>

>>>>>>>>>>>>>> doesn't permit<br>

>>>>>>>>>>>>>> the above reasoning. I really hope for the latter.<br>

>>>>>>>>>>>>>><br>

>>>>>>>>>>>>> That's a very valid point..!<br>

>>>>>>>>>>>>><br>

>>>>>>>>>>>>> Actually, I have a memory model that addresses this problem.<br>

>>>>>>>>>>>>> The gist is that we can interpret each allocation <br>

>>>>>>>>>>>>> instruction as *creating<br>

>>>>>>>>>>>>> 2 blocks* and nondeterministically returning one of them.<br>

>>>>>>>>>>>>> The other one is marked as invalid but not freed <br>

>>>>>>>>>>>>> immediately. Deallocation<br>

>>>>>>>>>>>>> frees both blocks.<br>

>>>>>>>>>>>>> If there is not enough space for having two blocks, it is <br>

>>>>>>>>>>>>> treated as<br>

>>>>>>>>>>>>> out-of-memory.<br>

>>>>>>>>>>>>><br>

>>>>>>>>>>>>> It makes guessing the address of an allocation according <br>

>>>>>>>>>>>>> to memory layout<br>

>>>>>>>>>>>>> invalid UB.<br>

>>>>>>>>>>>>><br>

>>>>>>>>>>>>> p = malloc(4)<br>

>>>>>>>>>>>>> // If p != null, it is guaranteed there was at least two 4 <br>

>>>>>>>>>>>>> byte slots<br>

>>>>>>>>>>>>> available, say 0x100~0x104 and 0x110~0x114.<br>

>>>>>>>>>>>>> // Two blocks are allocated at 0x110 and 0x114, and one of <br>

>>>>>>>>>>>>> the addresses is<br>

>>>>>>>>>>>>> nondeterministically returned.<br>

>>>>>>>>>>>>> // By the nature of nondeterminism, all executions below <br>

>>>>>>>>>>>>> should be<br>

>>>>>>>>>>>>> well-defined regardless of the address of p.<br>

>>>>>>>>>>>>> *(int*)0x100 = 10; // In an execution where p is 0x110, <br>

>>>>>>>>>>>>> this raises UB.<br>

>>>>>>>>>>>>><br>

>>>>>>>>>>>>> The paper link is here <br>

>>>>>>>>>>>>> <<a href="https://dl.acm.org/doi/10.1145/3276495" rel="noreferrer" target="_blank">https://dl.acm.org/doi/10.1145/3276495</a>> FYI.<br>

>>>>>>>>>>>><br>

>>>>>>>>>>>> Nice! Thanks for the link :)<br>

>>>>>>>>>>>><br>

>>>>>>>>>>>><br>

>>>>>>>>>>>>> To argue differently: Who is to say there is a stack, or <br>

>>>>>>>>>>>>> only one,<br>

>>>>>>>>>>>>>> or that alloca allocates memory "on the one stack"? That <br>

>>>>>>>>>>>>>> is not part of<br>

>>>>>>>>>>>>>> the IR, IMHO. I can write a machine on which alloca <br>

>>>>>>>>>>>>>> lowers to malloc,<br>

>>>>>>>>>>>>>> I don't even need the free during stack unwind but that I <br>

>>>>>>>>>>>>>> could do as<br>

>>>>>>>>>>>>>> well if I wanted to.<br>

>>>>>>>>>>>>><br>

>>>>>>>>>>>>> This is right, the example was for illustrative purposes. <br>

>>>>>>>>>>>>> IR does not<br>

>>>>>>>>>>>>> enforce alloca to be placed at 'stack'.<br>

>>>>>>>>>>>>><br>

>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>> Making icmp/ptrtoint yield poison will still make loop <br>

>>>>>>>>>>>>>>> versioning or<br>

>>>>>>>>>>>>>>> pointer rewriting transformations unsound because these <br>

>>>>>>>>>>>>>>> operations now<br>

>>>>>>>>>>>>>> can<br>

>>>>>>>>>>>>>>> create poison (even if pointers are noundef).<br>

>>>>>>>>>>>>>> I did not say they yield poison, at least I did not try <br>

>>>>>>>>>>>>>> to say that.<br>

>>>>>>>>>>>>>> What are you referring to exactly?<br>

>>>>>>>>>>>>>><br>

>>>>>>>>>>>>> I was referring to this:<br>

>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>> But this makes ptrtoint/icmp make UB-raising <br>

>>>>>>>>>>>>>>> instructions, which<br>

>>>>>>>>>>>>>>> contradicts with what LLVM does.<br>

>>>>>>>>>>>>>> As with other violation of attributes I would, on first <br>

>>>>>>>>>>>>>> though, suggest<br>

>>>>>>>>>>>>>> to produce poison, not UB.<br>

>>>>>>>>>>>>> But it is more about the (imaginary) attribute, so maybe I <br>

>>>>>>>>>>>>> was slightly out<br>

>>>>>>>>>>>>> of topic.<br>

>>>>>>>>>>>>><br>

>>>>>>>>>>>>>> 2) is fine, I think the suggestion semantically makes <br>

>>>>>>>>>>>>>> sense perfectly. 1)<br>

>>>>>>>>>>>>>>> is something I'm concerned about now.<br>

>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>> There are more than pointer foldings, such as rewriting <br>

>>>>>>>>>>>>>>> such expression,<br>

>>>>>>>>>>>>>>> code motion ptr cmp, introduce ptr cmp, etc. There is <br>

>>>>>>>>>>>>>>> also analysis<br>

>>>>>>>>>>>>>> relying<br>

>>>>>>>>>>>>>>> on ptr cmp.<br>

>>>>>>>>>>>>>>> Defining the correctness of each of them is something we <br>

>>>>>>>>>>>>>>> want to avoid,<br>

>>>>>>>>>>>>>> and<br>

>>>>>>>>>>>>>>> maybe that's why we want to define precise semantics for <br>

>>>>>>>>>>>>>>> things.<br>

>>>>>>>>>>>>>> I don't get the point. My proposal does not change the <br>

>>>>>>>>>>>>>> semantics of<br>

>>>>>>>>>>>>>> pointer comparisons, at all. I explicitly mentioned that <br>

>>>>>>>>>>>>>> in the last<br>

>>>>>>>>>>>>>> email.<br>

>>>>>>>>>>>>>><br>

>>>>>>>>>>>>> Oh okay, I thought it was a part of the lifetime proposal, <br>

>>>>>>>>>>>>> but it seems<br>

>>>>>>>>>>>>> more like a separate thing.<br>

>>>>>>>>>>>>> I agree that this requires performance impact.<br>

>>>>>>>>>>>>> Also investigation of existing transformations would be <br>

>>>>>>>>>>>>> needed; Alive2's<br>

>>>>>>>>>>>>> pointer comparison is doing approximation yet, but if it <br>

>>>>>>>>>>>>> becomes fully<br>

>>>>>>>>>>>>> precise, it will show something from running LLVM unit <br>

>>>>>>>>>>>>> tests I believe..! :)<br>

>>>>>>>>>>>>><br>

>>>>>>>>>>>>>> I think this will be less aggressive and may give nice <br>

>>>>>>>>>>>>>> feedback to<br>

>>>>>>>>>>>>>>> potential projects that are using lifetime with non-alloca.<br>

>>>>>>>>>>>>>> The lifetime marker debate, basically 2) above, is <br>

>>>>>>>>>>>>>> orthogonal to the<br>

>>>>>>>>>>>>>> problem<br>

>>>>>>>>>>>>>> you try to solve. It got mixed in as lifetime markers <br>

>>>>>>>>>>>>>> were used by<br>

>>>>>>>>>>>>>> StackColoring<br>

>>>>>>>>>>>>>> to perform coalescing but that is coincidental. You can <br>

>>>>>>>>>>>>>> (arguably)<br>

>>>>>>>>>>>>>> coalesce stack<br>

>>>>>>>>>>>>>> allocations regardless of lifetime markers and with 1) <br>

>>>>>>>>>>>>>> such a<br>

>>>>>>>>>>>>>> transformation<br>

>>>>>>>>>>>>>> (w/ and w/o lifetime markers) would actually be sound.<br>

>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>> At the end, we can implement IR writer that lowers <br>

>>>>>>>>>>>>>>> lifetime with<br>

>>>>>>>>>>>>>> non-alloca<br>

>>>>>>>>>>>>>>> into memset(undef). WDYT?<br>

>>>>>>>>>>>>>> Yeah, 2) is orthogonal and we can lower it that way. <br>

>>>>>>>>>>>>>> Unsure if it is<br>

>>>>>>>>>>>>>> helpful<br>

>>>>>>>>>>>>>> but we can certainly define it that way in the LangRef.<br>

>>>>>>>>>>>>>><br>

>>>>>>>>>>>>> Okay, thanks!<br>

>>>>>>>>>>>>><br>

>>>>>>>>>>>>><br>

>>>>>>>>>>>>>> ~ Johannes<br>

>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>> Thanks,<br>

>>>>>>>>>>>>>>> Juneyoung<br>

>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>> p.s. The reply was late, sorry. I think I can spend more <br>

>>>>>>>>>>>>>>> time on this<br>

>>>>>>>>>>>>>> today.<br>

>>>>>>>>>>>>>>> On Thu, Jan 7, 2021 at 9:02 AM Johannes Doerfert <<br>

>>>>>>>>>>>>>> <a href="mailto:johannesdoerfert@gmail.com" target="_blank">johannesdoerfert@gmail.com</a>><br>

>>>>>>>>>>>>>>> wrote:<br>

>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>> On 1/6/21 4:33 PM, Juneyoung Lee wrote:<br>

>>>>>>>>>>>>>>>>>>> Stepwisely defining the semantics of instructions is <br>

>>>>>>>>>>>>>>>>>>> a desirable<br>

>>>>>>>>>>>>>>>>> direction<br>

>>>>>>>>>>>>>>>>>>> IMO.<br>

>>>>>>>>>>>>>>>>>> I'm confused. What in the proposal would prevent us <br>

>>>>>>>>>>>>>>>>>> from defining<br>

>>>>>>>>>>>>>>>>>> the semantics of instructions, or force us to do it <br>

>>>>>>>>>>>>>>>>>> in an "undesirable<br>

>>>>>>>>>>>>>>>>> way"?<br>

>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>> I meant it would be great if the output state after <br>

>>>>>>>>>>>>>>>>> executing an<br>

>>>>>>>>>>>>>>>>> instruction can be described using its input state.<br>

>>>>>>>>>>>>>>>>> (that was the meaning of 'stepwise semantics', I <br>

>>>>>>>>>>>>>>>>> should have been more<br>

>>>>>>>>>>>>>>>>> clear about this)<br>

>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>> For example, the semantics of 'z = add x y' can be <br>

>>>>>>>>>>>>>>>>> defined as follows:<br>

>>>>>>>>>>>>>>>>> Given an input state s, next state s' = s[z -> s(x) + <br>

>>>>>>>>>>>>>>>>> s(y)]<br>

>>>>>>>>>>>>>>>>> where s(x) is the value of x in the previous state, <br>

>>>>>>>>>>>>>>>>> and s[z -> v] is a<br>

>>>>>>>>>>>>>>>>> state with z updated to v.<br>

>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>> Another example that involves memory: the semantics of <br>

>>>>>>>>>>>>>>>>> 'i = ptrtoint p'<br>

>>>>>>>>>>>>>>>> can<br>

>>>>>>>>>>>>>>>>> be defined as follows:<br>

>>>>>>>>>>>>>>>>> Given an input state s, next state s' = s[i -> <br>

>>>>>>>>>>>>>>>>> s(p).obj.address +<br>

>>>>>>>>>>>>>>>>> s(p).offset]<br>

>>>>>>>>>>>>>>>>> where obj.address is the begin address of a memory <br>

>>>>>>>>>>>>>>>>> object obj pointed<br>

>>>>>>>>>>>>>> by<br>

>>>>>>>>>>>>>>>> p<br>

>>>>>>>>>>>>>>>>> & offset is p's byte offset. (Imagine a pointer to the <br>

>>>>>>>>>>>>>>>>> offset of some<br>

>>>>>>>>>>>>>>>> char<br>

>>>>>>>>>>>>>>>>> array).<br>

>>>>>>>>>>>>>>>>> Note that ptrtoint & add can be nicely defined w.r.t <br>

>>>>>>>>>>>>>>>>> the input state.<br>

>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>> Now, the instruction that we're facing is 'p = alloca'.<br>

>>>>>>>>>>>>>>>>> To describe the output state after executing 'p = <br>

>>>>>>>>>>>>>>>>> alloca', the address<br>

>>>>>>>>>>>>>> of<br>

>>>>>>>>>>>>>>>>> new alloca should be determined.<br>

>>>>>>>>>>>>>>>>> If observedness is involved, we need to know the <br>

>>>>>>>>>>>>>>>>> future state again. :/<br>

>>>>>>>>>>>>>>>> We<br>

>>>>>>>>>>>>>>>>> don't know whether the alloca is going to be observed <br>

>>>>>>>>>>>>>>>>> or not without<br>

>>>>>>>>>>>>>>>> seeing<br>

>>>>>>>>>>>>>>>>> the future.<br>

>>>>>>>>>>>>>>>>> This is the problem of the current lifetime intrinsics <br>

>>>>>>>>>>>>>>>>> as well.<br>

>>>>>>>>>>>>>>>> No, you mix things up here.<br>

>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>> Nobody proposed to modify the semantics of `alloca`.<br>

>>>>>>>>>>>>>>>> `alloca` provides you with a fresh, unobserved block of<br>

>>>>>>>>>>>>>>>> dereferenceable memory that is implicitly freed as the <br>

>>>>>>>>>>>>>>>> stack<br>

>>>>>>>>>>>>>>>> unwinds. That is it. No context necessary.<br>

>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>> If you want to modify the IR, you need to argue the <br>

>>>>>>>>>>>>>>>> observable<br>

>>>>>>>>>>>>>>>> semantics which is nothing new. That this might require <br>

>>>>>>>>>>>>>>>> more than<br>

>>>>>>>>>>>>>>>> a peephole view of the program is also not new.<br>

>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>> One possible approach to resolve this is adding an <br>

>>>>>>>>>>>>>>>>> 'unobserved' flag to<br>

>>>>>>>>>>>>>>>>> alloca instruction (similar to what was suggested by <br>

>>>>>>>>>>>>>>>>> Nicolai).<br>

>>>>>>>>>>>>>>>>> And, we can say that if alloca with 'unobserved' is <br>

>>>>>>>>>>>>>>>>> used by<br>

>>>>>>>>>>>>>>>> ptrtoint/icmp,<br>

>>>>>>>>>>>>>>>>> it is UB.<br>

>>>>>>>>>>>>>>>> The flag can be added, like we add other attributes. It <br>

>>>>>>>>>>>>>>>> should not<br>

>>>>>>>>>>>>>>>> be required for any optimization we talked about <br>

>>>>>>>>>>>>>>>> though. It basically<br>

>>>>>>>>>>>>>>>> is a way to manifest derived or given information into <br>

>>>>>>>>>>>>>>>> the IR.<br>

>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>> Attribute deduction, as well as frontends with domain <br>

>>>>>>>>>>>>>>>> knowledge,<br>

>>>>>>>>>>>>>>>> can add such information. The flag we discussed in phab <br>

>>>>>>>>>>>>>>>> was not even<br>

>>>>>>>>>>>>>>>> sufficient for all the transformation examples I <br>

>>>>>>>>>>>>>>>> presented in my mail,<br>

>>>>>>>>>>>>>>>> that is why I extended my  argument. We could still <br>

>>>>>>>>>>>>>>>> have a "noescape"<br>

>>>>>>>>>>>>>>>> flag for allocas, but I'm not sure how useful that <br>

>>>>>>>>>>>>>>>> really is. We can<br>

>>>>>>>>>>>>>>>> certainly deduce it and manifest it, unsure if we have <br>

>>>>>>>>>>>>>>>> domain knowledge<br>

>>>>>>>>>>>>>>>> we can use for non-trivial cases though.<br>

>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>> But this makes ptrtoint/icmp make UB-raising <br>

>>>>>>>>>>>>>>>>> instructions, which<br>

>>>>>>>>>>>>>>>>> contradicts with what LLVM does.<br>

>>>>>>>>>>>>>>>> As with other violation of attributes I would, on first <br>

>>>>>>>>>>>>>>>> though, suggest<br>

>>>>>>>>>>>>>>>> to produce poison, not UB.<br>

>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>> Also, existing optimizations like loop versioning can <br>

>>>>>>>>>>>>>>>>> introduce<br>

>>>>>>>>>>>>>>>>> ptrtoint/pointer comparisons too.<br>

>>>>>>>>>>>>>>>> Sure. I am not certain why that is a problem. I get the <br>

>>>>>>>>>>>>>>>> feeling things<br>

>>>>>>>>>>>>>>>> are still mixed up here.<br>

>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>> What I proposed is twofold:<br>

>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>> 1) We stop folding comparisons between different <br>

>>>>>>>>>>>>>>>> allocas if changing the<br>

>>>>>>>>>>>>>>>> address<br>

>>>>>>>>>>>>>>>>       of both might be observable. Thus, if both might <br>

>>>>>>>>>>>>>>>> have their address<br>

>>>>>>>>>>>>>>>> "taken"/escaped,<br>

>>>>>>>>>>>>>>>>       other than the comparisons we want to fold, we <br>

>>>>>>>>>>>>>>>> cannot proceed.<br>

>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>> 2) We define lifetime markers to mean `memset(undef)`.<br>

>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>> The first should be sufficient for the problem you were <br>

>>>>>>>>>>>>>>>> trying to solve<br>

>>>>>>>>>>>>>>>> in the<br>

>>>>>>>>>>>>>>>> first place. The second makes lifetime markers less <br>

>>>>>>>>>>>>>>>> weird. Note that 1)<br>

>>>>>>>>>>>>>>>> is not changing<br>

>>>>>>>>>>>>>>>> the semantics of the IR. We basically just argue there <br>

>>>>>>>>>>>>>>>> is a bug in our<br>

>>>>>>>>>>>>>>>> instcombine right<br>

>>>>>>>>>>>>>>>> now as we do not check all necessary preconditions.<br>

>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>> I see that there are other questions that I didn't <br>

>>>>>>>>>>>>>>>>> answer yet, but let<br>

>>>>>>>>>>>>>> me<br>

>>>>>>>>>>>>>>>>> answer this first to limit the length of the text :)<br>

>>>>>>>>>>>>>>>> Sure, we can split the discussion :)<br>

>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>> ~ Johannes<br>

>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>> Thanks,<br>

>>>>>>>>>>>>>>>>> Juneyoung<br>

>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>> On Thu, Jan 7, 2021 at 3:36 AM Johannes Doerfert <<br>

>>>>>>>>>>>>>>>> <a href="mailto:johannesdoerfert@gmail.com" target="_blank">johannesdoerfert@gmail.com</a>><br>

>>>>>>>>>>>>>>>>> wrote:<br>

>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>> On 1/5/21 8:00 PM, Juneyoung Lee wrote:<br>

>>>>>>>>>>>>>>>>>>> Hi Johannes,<br>

>>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>>> I read your proposal and thought about the model.<br>

>>>>>>>>>>>>>>>>>> Cool, thanks!<br>

>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>>> As you concerned in A3, certain programs may be <br>

>>>>>>>>>>>>>>>>>>> valid only when<br>

>>>>>>>>>>>>>> memory<br>

>>>>>>>>>>>>>>>>>>> blocks with overlapping lifetimes have disjoint <br>

>>>>>>>>>>>>>>>>>>> addresses.<br>

>>>>>>>>>>>>>>>>>>> Look at this example (I'm using malloc, but alloca <br>

>>>>>>>>>>>>>>>>>>> also works):<br>

>>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>>> p1 = malloc(4)<br>

>>>>>>>>>>>>>>>>>>> p2 = malloc(4) // for brevity, assume that there as <br>

>>>>>>>>>>>>>>>>>>> enough space & p1<br>

>>>>>>>>>>>>>>>> and<br>

>>>>>>>>>>>>>>>>>>> p2 != null<br>

>>>>>>>>>>>>>>>>>>> set<char*> s;<br>

>>>>>>>>>>>>>>>>>>> s.insert(p1); s.insert(p2); // If the second insert <br>

>>>>>>>>>>>>>>>>>>> did nothing, it<br>

>>>>>>>>>>>>>>>> would<br>

>>>>>>>>>>>>>>>>>>> be surprise to programmers<br>

>>>>>>>>>>>>>>>>>>> work(s);<br>

>>>>>>>>>>>>>>>>>>> free(data1)<br>

>>>>>>>>>>>>>>>>>>> free(data2)<br>

>>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>>> Clearly, IR semantics should guarantee that escaped <br>

>>>>>>>>>>>>>>>>>>> blocks are<br>

>>>>>>>>>>>>>>>> disjoint.<br>

>>>>>>>>>>>>>>>>>> It<br>

>>>>>>>>>>>>>>>>>>> would be great for verification tools on LLVM IR to <br>

>>>>>>>>>>>>>>>>>>> be able to answer<br>

>>>>>>>>>>>>>>>>>> that<br>

>>>>>>>>>>>>>>>>>>> the second insert will succeed.<br>

>>>>>>>>>>>>>>>>>> I agree, the second insert should succeed, assuming <br>

>>>>>>>>>>>>>>>>>> `p1 && p2`.<br>

>>>>>>>>>>>>>>>>>> I don't think my proposal would in any way impact the <br>

>>>>>>>>>>>>>>>>>> program above,<br>

>>>>>>>>>>>>>>>>>> if anything the A3 reasoning makes sure such a <br>

>>>>>>>>>>>>>>>>>> program with allocas<br>

>>>>>>>>>>>>>>>>>> is not miscompiled.<br>

>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>> I'm also not sure I understand what you try to argue <br>

>>>>>>>>>>>>>>>>>> for. Maybe<br>

>>>>>>>>>>>>>>>>>> elaborate a bit what it is you think is bad or needs <br>

>>>>>>>>>>>>>>>>>> to be changed.<br>

>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>>> The problem is that definition of escapedness is not <br>

>>>>>>>>>>>>>>>>>>> clear at the<br>

>>>>>>>>>>>>>>>>>> semantic<br>

>>>>>>>>>>>>>>>>>>> level. Describing the IR semantics w.r.t. LLVM's <br>

>>>>>>>>>>>>>>>>>>> escaped analysis<br>

>>>>>>>>>>>>>>>>>> function<br>

>>>>>>>>>>>>>>>>>>> isn't something we want.<br>

>>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>>> Semantic definition of escapedness of a pointer <br>

>>>>>>>>>>>>>>>>>>> seems hard, I mean<br>

>>>>>>>>>>>>>> in a<br>

>>>>>>>>>>>>>>>>>>> stepwise manner.<br>

>>>>>>>>>>>>>>>>>>> It isn't a single instruction such as 'escape i8* <br>

>>>>>>>>>>>>>>>>>>> ptr', and we need<br>

>>>>>>>>>>>>>> to<br>

>>>>>>>>>>>>>>>>>> look<br>

>>>>>>>>>>>>>>>>>>> over all instructions in the function. For example, <br>

>>>>>>>>>>>>>>>>>>> '(int)(p+1) -<br>

>>>>>>>>>>>>>>>> (int)p'<br>

>>>>>>>>>>>>>>>>>>> isn't semantically escaping the pointer p because <br>

>>>>>>>>>>>>>>>>>>> the result is 1<br>

>>>>>>>>>>>>>>>>>>> regardless of the value of p.<br>

>>>>>>>>>>>>>>>>>>> Stepwisely defining the semantics of instructions is <br>

>>>>>>>>>>>>>>>>>>> a desirable<br>

>>>>>>>>>>>>>>>>>> direction<br>

>>>>>>>>>>>>>>>>>>> IMO.<br>

>>>>>>>>>>>>>>>>>> I'm confused. What in the proposal would prevent us <br>

>>>>>>>>>>>>>>>>>> from defining<br>

>>>>>>>>>>>>>>>>>> the semantics of instructions, or force us to do it <br>

>>>>>>>>>>>>>>>>>> in an "undesirable<br>

>>>>>>>>>>>>>>>>>> way"?<br>

>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>>> In practice, syntactically checking escapedness + <br>

>>>>>>>>>>>>>>>>>>> nice engineering<br>

>>>>>>>>>>>>>>>> might<br>

>>>>>>>>>>>>>>>>>>> not break optimizations in most cases (as <br>

>>>>>>>>>>>>>>>>>>> undef/poison did); but it<br>

>>>>>>>>>>>>>>>> would<br>

>>>>>>>>>>>>>>>>>>> be great to move to another level, since LLVM IR is <br>

>>>>>>>>>>>>>>>>>>> used in so many<br>

>>>>>>>>>>>>>>>>>> places<br>

>>>>>>>>>>>>>>>>>>> :)<br>

>>>>>>>>>>>>>>>>>> The property under which you can coalesce objects is <br>

>>>>>>>>>>>>>>>>>> simple:<br>

>>>>>>>>>>>>>>>>>>       It is not observable.<br>

>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>> Now, if the address of one of the two objects you <br>

>>>>>>>>>>>>>>>>>> coalesce is not<br>

>>>>>>>>>>>>>>>>>> observed, coalescing is not observable. That is a <br>

>>>>>>>>>>>>>>>>>> sufficient condition<br>

>>>>>>>>>>>>>>>>>> not a necessary one. Pointer "escaping" is one step <br>

>>>>>>>>>>>>>>>>>> further. If the<br>

>>>>>>>>>>>>>>>>>> address doesn't escape it is not observed. This does <br>

>>>>>>>>>>>>>>>>>> not mean the<br>

>>>>>>>>>>>>>>>>>> "semantic conditions for coalescing", e.g., for the <br>

>>>>>>>>>>>>>>>>>> purpose of<br>

>>>>>>>>>>>>>>>> translation<br>

>>>>>>>>>>>>>>>>>> validation, is supposed to be build on top of our <br>

>>>>>>>>>>>>>>>>>> "definition of<br>

>>>>>>>>>>>>>>>> escaping<br>

>>>>>>>>>>>>>>>>>> pointers". That said, we use "does not escape" as a <br>

>>>>>>>>>>>>>>>>>> precondition for<br>

>>>>>>>>>>>>>>>>>> various transformation and I'm unsure what is any <br>

>>>>>>>>>>>>>>>>>> different now. The<br>

>>>>>>>>>>>>>>>>>> entire escape argument is only used in the validity <br>

>>>>>>>>>>>>>>>>>> of the pointer<br>

>>>>>>>>>>>>>>>> folding.<br>

>>>>>>>>>>>>>>>>>> Similarly, we can fold a comparison of a noalias <br>

>>>>>>>>>>>>>>>>>> pointer with another<br>

>>>>>>>>>>>>>>>> one<br>

>>>>>>>>>>>>>>>>>> if the former does not escape (and both are <br>

>>>>>>>>>>>>>>>>>> dereferenced and one is<br>

>>>>>>>>>>>>>>>>>> written for sure).<br>

>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>>> The pointer comparison is another beast. It indeed <br>

>>>>>>>>>>>>>>>>>>> has a few issues,<br>

>>>>>>>>>>>>>>>> and<br>

>>>>>>>>>>>>>>>>>>> solving it might require nontrivial solution.<br>

>>>>>>>>>>>>>>>>>> I think the main problem of the inconsistencies and <br>

>>>>>>>>>>>>>>>>>> such we've seen is<br>

>>>>>>>>>>>>>>>>>> rooted in the erroneous folding of pointer <br>

>>>>>>>>>>>>>>>>>> comparisons. Cleaning up<br>

>>>>>>>>>>>>>> the<br>

>>>>>>>>>>>>>>>>>> lifetime marker semantics is actually unrelated and <br>

>>>>>>>>>>>>>>>>>> simply not folding<br>

>>>>>>>>>>>>>>>>>> as described in A3 should solve the issue that has <br>

>>>>>>>>>>>>>>>>>> been reported.<br>

>>>>>>>>>>>>>>>>>> Nevertheless,<br>

>>>>>>>>>>>>>>>>>> we should take a crack at lifetime markers while we <br>

>>>>>>>>>>>>>>>>>> are here.<br>

>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>> ~ Johannes<br>

>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>>> Thanks,<br>

>>>>>>>>>>>>>>>>>>> Juneyoung<br>

>>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>>> On Tue, Jan 5, 2021 at 9:37 AM Johannes Doerfert <<br>

>>>>>>>>>>>>>>>>>> <a href="mailto:johannesdoerfert@gmail.com" target="_blank">johannesdoerfert@gmail.com</a>><br>

>>>>>>>>>>>>>>>>>>> wrote:<br>

>>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>>>> Hi Juneyoung,<br>

>>>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>>>> Happy new year :)<br>

>>>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>>>> After we had a lengthy discussion on phab last <br>

>>>>>>>>>>>>>>>>>>>> year, I've tried to<br>

>>>>>>>>>>>>>>>>>>>> summarize my thoughts,<br>

>>>>>>>>>>>>>>>>>>>> especially given that I had some time to think <br>

>>>>>>>>>>>>>>>>>>>> about things over the<br>

>>>>>>>>>>>>>>>>>> break.<br>

>>>>>>>>>>>>>>>>>>>> Still, no promises on the quality ;)<br>

>>>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>>>> I start with general questions I asked myself and <br>

>>>>>>>>>>>>>>>>>>>> then go on<br>

>>>>>>>>>>>>>> rambling<br>

>>>>>>>>>>>>>>>>>>>> about a potential design.<br>

>>>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>>>> Q1: Is lifetime a given property or a derived one, <br>

>>>>>>>>>>>>>>>>>>>> thus is it fixed<br>

>>>>>>>>>>>>>> or<br>

>>>>>>>>>>>>>>>>>>>> modifiable?<br>

>>>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>>>> This is a question I asked myself a lot recently. I <br>

>>>>>>>>>>>>>>>>>>>> think it is<br>

>>>>>>>>>>>>>>>> derived<br>

>>>>>>>>>>>>>>>>>>>> and modifiable,<br>

>>>>>>>>>>>>>>>>>>>> at least I hope it is. Only that would allow <br>

>>>>>>>>>>>>>>>>>>>> transformations I would<br>

>>>>>>>>>>>>>>>>>> want<br>

>>>>>>>>>>>>>>>>>>>> us to do. Here are some examples:<br>

>>>>>>>>>>>>>>>>>>>> <a href="https://godbolt.org/z/G8obj3" rel="noreferrer" target="_blank">https://godbolt.org/z/G8obj3</a><br>

>>>>>>>>>>>>>>>>>>>> <a href="https://godbolt.org/z/obaTc" rel="noreferrer" target="_blank">https://godbolt.org/z/obaTc</a><br>

>>>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>>>> Q2: Is a pointer comparison, or similar use, <br>

>>>>>>>>>>>>>>>>>>>> extending the lifetime?<br>

>>>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>>>> Asked differently, can we hoist a pointer <br>

>>>>>>>>>>>>>>>>>>>> comparison into a region<br>

>>>>>>>>>>>>>>>> where<br>

>>>>>>>>>>>>>>>>>>>> the pointer is dead?<br>

>>>>>>>>>>>>>>>>>>>> This is an important question which we haven't <br>

>>>>>>>>>>>>>>>>>>>> discussed much as we<br>

>>>>>>>>>>>>>>>>>>>> assumed LICM has to work.<br>

>>>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>>>> The current behavior is that non-dereferencing uses <br>

>>>>>>>>>>>>>>>>>>>> are not<br>

>>>>>>>>>>>>>> extending<br>

>>>>>>>>>>>>>>>>>>>> the lifetime and are<br>

>>>>>>>>>>>>>>>>>>>> allowed outside of "lifetime regions" (as indicated <br>

>>>>>>>>>>>>>>>>>>>> by markers).<br>

>>>>>>>>>>>>>> They<br>

>>>>>>>>>>>>>>>>>>>> will always produce valid<br>

>>>>>>>>>>>>>>>>>>>> results. Though, we might want to think about a <br>

>>>>>>>>>>>>>>>>>>>> lifetime marker that<br>

>>>>>>>>>>>>>>>>>>>> spits out a new pointer<br>

>>>>>>>>>>>>>>>>>>>> value instead of reusing the old one.<br>

>>>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>>>> Q3: Can we use lifetime to argue about addresses?<br>

>>>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>>>> The question here is, can we fold address <br>

>>>>>>>>>>>>>>>>>>>> comparisons based on<br>

>>>>>>>>>>>>>>>>>>>> lifetimes, or not.<br>

>>>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>>>> So far, we fold comparisons based on "address <br>

>>>>>>>>>>>>>>>>>>>> information". For<br>

>>>>>>>>>>>>>>>> example,<br>

>>>>>>>>>>>>>>>>>>>> we "know" globals,<br>

>>>>>>>>>>>>>>>>>>>> allocas, and mallocs cannot be equal to one <br>

>>>>>>>>>>>>>>>>>>>> another. Also, two<br>

>>>>>>>>>>>>>>>> distinct<br>

>>>>>>>>>>>>>>>>>>>> allocations, for globals<br>

>>>>>>>>>>>>>>>>>>>> and allocas, are considered unequal. Now, the crux <br>

>>>>>>>>>>>>>>>>>>>> is that we have<br>

>>>>>>>>>>>>>> to<br>

>>>>>>>>>>>>>>>> be<br>

>>>>>>>>>>>>>>>>>>>> consistent if we do two<br>

>>>>>>>>>>>>>>>>>>>> comparisons, and, as of now, we are not (bug number <br>

>>>>>>>>>>>>>>>>>>>> missing). Since<br>

>>>>>>>>>>>>>>>> the<br>

>>>>>>>>>>>>>>>>>>>> backend (or any other place<br>

>>>>>>>>>>>>>>>>>>>> for that matter) might coalesce allocas, their <br>

>>>>>>>>>>>>>>>>>>>> addresses might not<br>

>>>>>>>>>>>>>> be<br>

>>>>>>>>>>>>>>>>>>>> different after all. If we<br>

>>>>>>>>>>>>>>>>>>>> already folded a comparison to "unequal" we are <br>

>>>>>>>>>>>>>>>>>>>> doomed if we later<br>

>>>>>>>>>>>>>>>> have<br>

>>>>>>>>>>>>>>>>>>>> a comparison that results<br>

>>>>>>>>>>>>>>>>>>>> in "equal". (Note, this is different from aliasing <br>

>>>>>>>>>>>>>>>>>>>> rules as they can<br>

>>>>>>>>>>>>>>>> be<br>

>>>>>>>>>>>>>>>>>>>> stricter.)<br>

>>>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>>>> Design:<br>

>>>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>>>> I would hope we can come up with a model that <br>

>>>>>>>>>>>>>>>>>>>> treats memory "the<br>

>>>>>>>>>>>>>>>> same",<br>

>>>>>>>>>>>>>>>>>>>> regardless if it is global,<br>

>>>>>>>>>>>>>>>>>>>> stack, or heap. I want to avoid special casing one <br>

>>>>>>>>>>>>>>>>>>>> of them wrt.<br>

>>>>>>>>>>>>>>>> lifetime<br>

>>>>>>>>>>>>>>>>>>>> as I believe most optimizations<br>

>>>>>>>>>>>>>>>>>>>> would apply to any of them, potentially for <br>

>>>>>>>>>>>>>>>>>>>> different reasons and<br>

>>>>>>>>>>>>>> with<br>

>>>>>>>>>>>>>>>>>>>> different gains, but nevertheless.<br>

>>>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>>>> Proposal (largely based on the conversation in phab):<br>

>>>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>>>> A1: Lifetime is a concept that talks about memory <br>

>>>>>>>>>>>>>>>>>>>> content *only*.<br>

>>>>>>>>>>>>>>>>>>>> Basically, the memory content is set to<br>

>>>>>>>>>>>>>>>>>>>>          undefined by lifetime markers. It is <br>

>>>>>>>>>>>>>>>>>>>> derived/modifiable as<br>

>>>>>>>>>>>>>> it<br>

>>>>>>>>>>>>>>>> is<br>

>>>>>>>>>>>>>>>>>>>> just an "as-is" property of the memory<br>

>>>>>>>>>>>>>>>>>>>>          content. The lifetimes of an object, as <br>

>>>>>>>>>>>>>>>>>>>> described by<br>

>>>>>>>>>>>>>> markers,<br>

>>>>>>>>>>>>>>>>>> might<br>

>>>>>>>>>>>>>>>>>>>> change during the compilation. They<br>

>>>>>>>>>>>>>>>>>>>>          might become smaller if we deduce the <br>

>>>>>>>>>>>>>>>>>>>> object is not accessed<br>

>>>>>>>>>>>>>>>> and<br>

>>>>>>>>>>>>>>>>>>>> the memory content is not used, they<br>

>>>>>>>>>>>>>>>>>>>>          might become larger if objects with <br>

>>>>>>>>>>>>>>>>>>>> non-overlapping<br>

>>>>>>>>>>>>>> lifetimes<br>

>>>>>>>>>>>>>>>> are<br>

>>>>>>>>>>>>>>>>>>>> coalesced. (One could see the latter as<br>

>>>>>>>>>>>>>>>>>>>>          introducing a new object though.)<br>

>>>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>>>> A2: If we define lifetime as above, it has nothing <br>

>>>>>>>>>>>>>>>>>>>> to do with the<br>

>>>>>>>>>>>>>>>>>>>> address of an object. Consequently, pointer<br>

>>>>>>>>>>>>>>>>>>>>          comparisons and similar operations are <br>

>>>>>>>>>>>>>>>>>>>> valid outside the<br>

>>>>>>>>>>>>>>>> lifetime.<br>

>>>>>>>>>>>>>>>>>>>> Loads and stores are as well, they can<br>

>>>>>>>>>>>>>>>>>>>>          even not be removed "easily". A store <br>

>>>>>>>>>>>>>>>>>>>> followed by a lifetime<br>

>>>>>>>>>>>>>>>>>> marker<br>

>>>>>>>>>>>>>>>>>>>> or a load following a lifetime marker<br>

>>>>>>>>>>>>>>>>>>>>          is dead or results in undef respectively.<br>

>>>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>>>> A3: We could not use lifetime to argue about <br>

>>>>>>>>>>>>>>>>>>>> addresses. This means<br>

>>>>>>>>>>>>>> we<br>

>>>>>>>>>>>>>>>>>>>> could/should also not argue that overlapping<br>

>>>>>>>>>>>>>>>>>>>>          lifetimes result in different addresses. <br>

>>>>>>>>>>>>>>>>>>>> Thus, a comparison<br>

>>>>>>>>>>>>>>>>>> between<br>

>>>>>>>>>>>>>>>>>>>> the address of two allocas could not<br>

>>>>>>>>>>>>>>>>>>>>          immediately be folded. That said, there <br>

>>>>>>>>>>>>>>>>>>>> would be special<br>

>>>>>>>>>>>>>> cases<br>

>>>>>>>>>>>>>>>>>>>> though. Basically, if one of the allocas does<br>

>>>>>>>>>>>>>>>>>>>>          not escape, other than the comparisons to <br>

>>>>>>>>>>>>>>>>>>>> be folded, we can<br>

>>>>>>>>>>>>>>>> fold<br>

>>>>>>>>>>>>>>>>>>>> them. Afterwards, coalescing or splitting<br>

>>>>>>>>>>>>>>>>>>>>          would still be consistent because it is <br>

>>>>>>>>>>>>>>>>>>>> unobservable. The<br>

>>>>>>>>>>>>>>>> problem<br>

>>>>>>>>>>>>>>>>>>>> we have in-tree is that we fold even though<br>

>>>>>>>>>>>>>>>>>>>>          the address is still observed (after the <br>

>>>>>>>>>>>>>>>>>>>> fold). It is still<br>

>>>>>>>>>>>>>>>>>> unclear<br>

>>>>>>>>>>>>>>>>>>>> to me what the impact of this would be<br>

>>>>>>>>>>>>>>>>>>>>          on real code. I suggested before that we <br>

>>>>>>>>>>>>>>>>>>>> run some<br>

>>>>>>>>>>>>>> experiments<br>

>>>>>>>>>>>>>>>>>> first<br>

>>>>>>>>>>>>>>>>>>>> before we make any decision whatsoever.<br>

>>>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>>>> This is pretty much saying that lifetime markers are<br>

>>>>>>>>>>>>>> `memset(undef)`,<br>

>>>>>>>>>>>>>>>> as<br>

>>>>>>>>>>>>>>>>>>>> you suggested before (I think).<br>

>>>>>>>>>>>>>>>>>>>> There are some implementation level differences but <br>

>>>>>>>>>>>>>>>>>>>> at the end of<br>

>>>>>>>>>>>>>> the<br>

>>>>>>>>>>>>>>>>>>>> day they are basically the same.<br>

>>>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>>>> Happy to hear some thoughts on this, especially if <br>

>>>>>>>>>>>>>>>>>>>> it fixes the<br>

>>>>>>>>>>>>>>>> problems<br>

>>>>>>>>>>>>>>>>>>>> that lead to D93376 in the first place.<br>

>>>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>>>> ~ Johannes<br>

>>>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>>>> On 12/18/20 2:42 AM, Juneyoung Lee via llvm-dev wrote:<br>

>>>>>>>>>>>>>>>>>>>>> Hello all,<br>

>>>>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>>>>> We're discussing the well-formedness of <br>

>>>>>>>>>>>>>>>>>>>>> lifetime.start/end<br>

>>>>>>>>>>>>>> intrinsic<br>

>>>>>>>>>>>>>>>>>>>> here (<br>

>>>>>>>>>>>>>>>>>>>>> <a href="https://reviews.llvm.org/D93376" rel="noreferrer" target="_blank">https://reviews.llvm.org/D93376</a>), deciding what is a<br>

>>>>>>>>>>>>>> (syntactically<br>

>>>>>>>>>>>>>>>> &<br>

>>>>>>>>>>>>>>>>>>>>> semantically) valid use of these intrinsics.<br>

>>>>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>>>>> I'd like to gather more context about the intrinsics.<br>

>>>>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>>>>> 1. Is there a frontend or framework that <br>

>>>>>>>>>>>>>>>>>>>>> introduces lifetime call<br>

>>>>>>>>>>>>>>>> with<br>

>>>>>>>>>>>>>>>>>>>>> non-stack allocated objects?<br>

>>>>>>>>>>>>>>>>>>>>> If exists, which behavior do they expect from it?<br>

>>>>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>>>>> 2. Can a block's lifetime be started bytewise or <br>

>>>>>>>>>>>>>>>>>>>>> elementwise?<br>

>>>>>>>>>>>>>>>>>>>>> I imagine an optimization that allow using stack <br>

>>>>>>>>>>>>>>>>>>>>> very compactly,<br>

>>>>>>>>>>>>>> but<br>

>>>>>>>>>>>>>>>>>>>> wonder<br>

>>>>>>>>>>>>>>>>>>>>> there is a real-world use case.<br>

>>>>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>>>>> Thanks,<br>

>>>>>>>>>>>>>>>>>>>>> Juneyoung<br>

>>>>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>>>>> _______________________________________________<br>

>>>>>>>>>>>>>>>>>>>>> LLVM Developers mailing list<br>

>>>>>>>>>>>>>>>>>>>>> <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

>>>>>>>>>>>>>>>>>>>>> <a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a> <br>

>>>>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>><br>

>>>>>>>>>>><br>

>>>>>>>>><br>

>>>>>>><br>

>>>>><br>

>>><br>

><br>

</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr"><div dir="ltr"><div><br></div><font size="1">Juneyoung Lee</font><div><font size="1">Software Foundation Lab, Seoul National University</font></div></div></div>