<div dir="ltr">I want to avoid allowing 'pointer zapping' too; LLVM IR has a static single assignment form, and I think it might not be a good idea to add a special rule that violates this.<div>Well, there might be some semantics that would explain optimizations that we can imagine, but it introduces more complexity to the language semantics by adding a special case.<br><div><br></div><div>BTW, is there any reason why allocas are being placed at the entry block?</div><div>If alloca was simply placed at the location where lifetime.start is currently being placed, we don't need to struggle with this complication about the semantics of lifetime.start..?<br></div></div><div><br></div><div>Thanks,</div><div>Juneyoung</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Jan 26, 2021 at 7:22 PM Ralf Jung <<a href="mailto:jung@mpi-sws.org">jung@mpi-sws.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">Hi Johannes,<br>

<br>

> On 1/24/21 10:07 AM, Ralf Jung wrote:<br>

>> Hi Johannes,<br>

>><br>

>>> At some point I lost track of what you even want to achieve.<br>

>><br>

>> *lol* fair.^^<br>

>> All I want is a precise and meaningful spec for lifetime.start and <br>

>> lifetime.end. That's where we entered this rabbit hole. You didn't like <br>

>> Juneyoung's and Nuno's proposal for this and instead suggested to replace the <br>

>> entire system by something else, and that new system is what we are discussing <br>

>> now (that is my understanding, anyway).<br>

>><br>

> I went back to the proposal in question, <a href="https://reviews.llvm.org/D93376" rel="noreferrer" target="_blank">https://reviews.llvm.org/D93376</a>, and I <br>

> could not right away determine why this would avoid the inconsistency we can <br>

> imagine for something like<br>

> <br>

> ```<br>

> int *G = 0;<br>

> void bar(int *p) __noinline__ { if (G) print(G == p); else G = p;<br>

> void foo() {<br>

>    intptr_t xp, yp;<br>

>    { int x; xp = &x; bar(&x); }<br>

>    { int y; yp = &y; bar(&y); }<br>

>    print(xp == yp);               // One can imagine this is folded to false and <br>

> because x and y are coalesced bar will print true.<br>

> }<br>

> <br>

> ```<br>

> <br>

> I was opposing the proposal because of the brittle syntactic restrictions. In my <br>

> attempt to justify not having those I also tried to resolve the above <br>

> inconsistency in a more general manner because I thought that was what it was <br>

> all about. I believe that we either have to argue the address of a dead object <br>

> is unusable/undetermined, basically the direction of James comment, or we need <br>

> to decouple lifetime markers from locations.<br>

<br>

Folding to false and coalescing are inconsistent with each other, yes. It is <br>

just wrong to perform both of these on the above program, I would say (if we <br>

interpret it with LLVM semantics, which I assume has no "pointer zapping"). Is <br>

that up for debate?<br>

<br>

However, I do not see the immediate connection to lifetime markers -- the same <br>

would be true for a program using malloc/free. Coalescing is permissible <br>

whenever lifetimes are disjoint. Folding to false is only permissible if the <br>

pointers are *definitely not equal*, e.g. if their lifetimes overlap.<br>

<br>

>> Sure, everything in the semantics is up to what the program actually observes. <br>

>> But I was considering the general case, in which it will not be possible to <br>

>> prove that some address is not observed. I expect that to be quite common <br>

>> (certainly, Rust programs really like to take references to local variables <br>

>> and pass them to other functions), and in those cases, not having <br>

>> lifetime.start/end will severely restrict program transformations -- as you <br>

>> observed, even outlining is affected.<br>

>><br>

>> This is basically the abstract version of the concrete concern that James raised.<br>

> <br>

> I get the argument that it can hinder optimizations but it is abstract. It seems <br>

> pretty simple to test this out, I mean to restrict the folding and the <br>

> coalescing. The benefit, which I think got ignored here a lot, is that it <br>

> provides a clear path towards deduction of coalescing opportunities, that is, <br>

> coalescing in the middle-end or backend w/o lifetime markers. Though, similar to <br>

> the impact of my proposal, I have no idea if that is even needed.<br>

<br>

It is certainly not simple for me to test out anything that would involve <br>

changing LLVM. ;)<br>

The syntactic condition also provides a clear path to evaluate whether any given <br>

folding or coalescing is correct. Any sufficiently precise semantics will do <br>

that. The question is just which semantics y'all want. If the semantics require <br>

"pointer zapping" to remain sufficiently optimizable, that sounds like a serious <br>

drawback to me.<br>

<br>

Kind regards,<br>

Ralf<br>

<br>

> <br>

> ~ Johannes<br>

> <br>

> <br>

>><br>

>> Kind regards,<br>

>> Ralf<br>

>><br>

>>> A lot of my arguments around alloca/malloc/... is that we can augment the <br>

>>> information from the frontend by determining if an address is observed or <br>

>>> not. If it is not, we can limit its allocation, coalesce it, etc. Lifetime <br>

>>> markers can help us do this but they are not the only way.<br>

>>> So, basically, we do not need lifetime markers to coalesce alloca/malloc/... <br>

>>> but that they can help.<br>

>><br>

>><br>

>><br>

>>><br>

>>> Let's try to get out of this rabbit hole. If you have a problem with the <br>

>>> proposal as outlined in <br>

>>> <a href="https://lists.llvm.org/pipermail/llvm-dev/2021-January/147594.html" rel="noreferrer" target="_blank">https://lists.llvm.org/pipermail/llvm-dev/2021-January/147594.html</a> , please <br>

>>> let me know. I assume dealing with observed addresses different than with <br>

>>> non-observede ones (implied by noescape) is the way forward now.<br>

>>><br>

>>> ~ Johannes<br>

>>><br>

>>><br>

>>><br>

>>>><br>

>>>> Kind regards,<br>

>>>> Ralf<br>

>>>><br>

>>>><br>

>>>>><br>

>>>>><br>

>>>>>><br>

>>>>>> Kind regards,<br>

>>>>>> Ralf<br>

>>>>>><br>

>>>>>>><br>

>>>>>>> Does that make some sense?<br>

>>>>>>><br>

>>>>>>> ~ Johannes<br>

>>>>>>><br>

>>>>>>><br>

>>>>>>>><br>

>>>>>>>> Kind regards,<br>

>>>>>>>> Ralf<br>

>>>>>>>><br>

>>>>>>>>><br>

>>>>>>>>><br>

>>>>>>>>> On 1/7/21 10:12 PM, Juneyoung Lee wrote:<br>

>>>>>>>>>> On Fri, Jan 8, 2021 at 8:54 AM Johannes Doerfert <br>

>>>>>>>>>> <<a href="mailto:johannesdoerfert@gmail.com" target="_blank">johannesdoerfert@gmail.com</a>><br>

>>>>>>>>>> wrote:<br>

>>>>>>>>>><br>

>>>>>>>>>>> On 1/7/21 4:55 PM, Juneyoung Lee wrote:<br>

>>>>>>>>>>>> It is, alloca must have its integral address decided at its <br>

>>>>>>>>>>>> *allocation*<br>

>>>>>>>>>>>> time as well, like 0xFF00... .<br>

>>>>>>>>>>>><br>

>>>>>>>>>>>> For example, if 0xFFFF0000 ~ 0xFF000000 is already allocated, the new<br>

>>>>>>>>>>>> alloca cannot be placed there unless it is not observed in the future,<br>

>>>>>>>>>>>> according to your proposal.<br>

>>>>>>>>>>>><br>

>>>>>>>>>>>> But how do we know from the current state whether the stack variable is<br>

>>>>>>>>>>>> going to be observed or not?<br>

>>>>>>>>>>> With that argument you could allocate all but 4 bytes, do a malloc(4)<br>

>>>>>>>>>>> and know the address of the returned pointer (assuming it is not null).<br>

>>>>>>>>>>><br>

>>>>>>>>>>> What I try to say is, either your scenario is part of the model and<br>

>>>>>>>>>>> everything we do is broken as you could "observe" addresses passively,<br>

>>>>>>>>>>> *or*, the abstract machine we use for semantic reasoning doesn't permit<br>

>>>>>>>>>>> the above reasoning. I really hope for the latter.<br>

>>>>>>>>>>><br>

>>>>>>>>>> That's a very valid point..!<br>

>>>>>>>>>><br>

>>>>>>>>>> Actually, I have a memory model that addresses this problem.<br>

>>>>>>>>>> The gist is that we can interpret each allocation instruction as <br>

>>>>>>>>>> *creating<br>

>>>>>>>>>> 2 blocks* and nondeterministically returning one of them.<br>

>>>>>>>>>> The other one is marked as invalid but not freed immediately. <br>

>>>>>>>>>> Deallocation<br>

>>>>>>>>>> frees both blocks.<br>

>>>>>>>>>> If there is not enough space for having two blocks, it is treated as<br>

>>>>>>>>>> out-of-memory.<br>

>>>>>>>>>><br>

>>>>>>>>>> It makes guessing the address of an allocation according to memory layout<br>

>>>>>>>>>> invalid UB.<br>

>>>>>>>>>><br>

>>>>>>>>>> p = malloc(4)<br>

>>>>>>>>>> // If p != null, it is guaranteed there was at least two 4 byte slots<br>

>>>>>>>>>> available, say 0x100~0x104 and 0x110~0x114.<br>

>>>>>>>>>> // Two blocks are allocated at 0x110 and 0x114, and one of the <br>

>>>>>>>>>> addresses is<br>

>>>>>>>>>> nondeterministically returned.<br>

>>>>>>>>>> // By the nature of nondeterminism, all executions below should be<br>

>>>>>>>>>> well-defined regardless of the address of p.<br>

>>>>>>>>>> *(int*)0x100 = 10; // In an execution where p is 0x110, this raises UB.<br>

>>>>>>>>>><br>

>>>>>>>>>> The paper link is here <<a href="https://dl.acm.org/doi/10.1145/3276495" rel="noreferrer" target="_blank">https://dl.acm.org/doi/10.1145/3276495</a>> FYI.<br>

>>>>>>>>><br>

>>>>>>>>> Nice! Thanks for the link :)<br>

>>>>>>>>><br>

>>>>>>>>><br>

>>>>>>>>>> To argue differently: Who is to say there is a stack, or only one,<br>

>>>>>>>>>>> or that alloca allocates memory "on the one stack"? That is not part of<br>

>>>>>>>>>>> the IR, IMHO. I can write a machine on which alloca lowers to malloc,<br>

>>>>>>>>>>> I don't even need the free during stack unwind but that I could do as<br>

>>>>>>>>>>> well if I wanted to.<br>

>>>>>>>>>><br>

>>>>>>>>>> This is right, the example was for illustrative purposes. IR does not<br>

>>>>>>>>>> enforce alloca to be placed at 'stack'.<br>

>>>>>>>>>><br>

>>>>>>>>>><br>

>>>>>>>>>>>> Making icmp/ptrtoint yield poison will still make loop versioning or<br>

>>>>>>>>>>>> pointer rewriting transformations unsound because these operations now<br>

>>>>>>>>>>> can<br>

>>>>>>>>>>>> create poison (even if pointers are noundef).<br>

>>>>>>>>>>> I did not say they yield poison, at least I did not try to say that.<br>

>>>>>>>>>>> What are you referring to exactly?<br>

>>>>>>>>>>><br>

>>>>>>>>>> I was referring to this:<br>

>>>>>>>>>><br>

>>>>>>>>>>>> But this makes ptrtoint/icmp make UB-raising instructions, which<br>

>>>>>>>>>>>> contradicts with what LLVM does.<br>

>>>>>>>>>>> As with other violation of attributes I would, on first though, suggest<br>

>>>>>>>>>>> to produce poison, not UB.<br>

>>>>>>>>>> But it is more about the (imaginary) attribute, so maybe I was <br>

>>>>>>>>>> slightly out<br>

>>>>>>>>>> of topic.<br>

>>>>>>>>>><br>

>>>>>>>>>>> 2) is fine, I think the suggestion semantically makes sense <br>

>>>>>>>>>>> perfectly. 1)<br>

>>>>>>>>>>>> is something I'm concerned about now.<br>

>>>>>>>>>>>><br>

>>>>>>>>>>>> There are more than pointer foldings, such as rewriting such <br>

>>>>>>>>>>>> expression,<br>

>>>>>>>>>>>> code motion ptr cmp, introduce ptr cmp, etc. There is also analysis<br>

>>>>>>>>>>> relying<br>

>>>>>>>>>>>> on ptr cmp.<br>

>>>>>>>>>>>> Defining the correctness of each of them is something we want to avoid,<br>

>>>>>>>>>>> and<br>

>>>>>>>>>>>> maybe that's why we want to define precise semantics for things.<br>

>>>>>>>>>>> I don't get the point. My proposal does not change the semantics of<br>

>>>>>>>>>>> pointer comparisons, at all. I explicitly mentioned that in the last<br>

>>>>>>>>>>> email.<br>

>>>>>>>>>>><br>

>>>>>>>>>> Oh okay, I thought it was a part of the lifetime proposal, but it seems<br>

>>>>>>>>>> more like a separate thing.<br>

>>>>>>>>>> I agree that this requires performance impact.<br>

>>>>>>>>>> Also investigation of existing transformations would be needed; Alive2's<br>

>>>>>>>>>> pointer comparison is doing approximation yet, but if it becomes fully<br>

>>>>>>>>>> precise, it will show something from running LLVM unit tests I <br>

>>>>>>>>>> believe..! :)<br>

>>>>>>>>>><br>

>>>>>>>>>>> I think this will be less aggressive and may give nice feedback to<br>

>>>>>>>>>>>> potential projects that are using lifetime with non-alloca.<br>

>>>>>>>>>>> The lifetime marker debate, basically 2) above, is orthogonal to the<br>

>>>>>>>>>>> problem<br>

>>>>>>>>>>> you try to solve. It got mixed in as lifetime markers were used by<br>

>>>>>>>>>>> StackColoring<br>

>>>>>>>>>>> to perform coalescing but that is coincidental. You can (arguably)<br>

>>>>>>>>>>> coalesce stack<br>

>>>>>>>>>>> allocations regardless of lifetime markers and with 1) such a<br>

>>>>>>>>>>> transformation<br>

>>>>>>>>>>> (w/ and w/o lifetime markers) would actually be sound.<br>

>>>>>>>>>>><br>

>>>>>>>>>>><br>

>>>>>>>>>>>> At the end, we can implement IR writer that lowers lifetime with<br>

>>>>>>>>>>> non-alloca<br>

>>>>>>>>>>>> into memset(undef). WDYT?<br>

>>>>>>>>>>> Yeah, 2) is orthogonal and we can lower it that way. Unsure if it is<br>

>>>>>>>>>>> helpful<br>

>>>>>>>>>>> but we can certainly define it that way in the LangRef.<br>

>>>>>>>>>>><br>

>>>>>>>>>> Okay, thanks!<br>

>>>>>>>>>><br>

>>>>>>>>>><br>

>>>>>>>>>>> ~ Johannes<br>

>>>>>>>>>>><br>

>>>>>>>>>>><br>

>>>>>>>>>>><br>

>>>>>>>>>>>> Thanks,<br>

>>>>>>>>>>>> Juneyoung<br>

>>>>>>>>>>>><br>

>>>>>>>>>>>> p.s. The reply was late, sorry. I think I can spend more time on this<br>

>>>>>>>>>>> today.<br>

>>>>>>>>>>>> On Thu, Jan 7, 2021 at 9:02 AM Johannes Doerfert <<br>

>>>>>>>>>>> <a href="mailto:johannesdoerfert@gmail.com" target="_blank">johannesdoerfert@gmail.com</a>><br>

>>>>>>>>>>>> wrote:<br>

>>>>>>>>>>>><br>

>>>>>>>>>>>>> On 1/6/21 4:33 PM, Juneyoung Lee wrote:<br>

>>>>>>>>>>>>>>>> Stepwisely defining the semantics of instructions is a desirable<br>

>>>>>>>>>>>>>> direction<br>

>>>>>>>>>>>>>>>> IMO.<br>

>>>>>>>>>>>>>>> I'm confused. What in the proposal would prevent us from defining<br>

>>>>>>>>>>>>>>> the semantics of instructions, or force us to do it in an <br>

>>>>>>>>>>>>>>> "undesirable<br>

>>>>>>>>>>>>>> way"?<br>

>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>> I meant it would be great if the output state after executing an<br>

>>>>>>>>>>>>>> instruction can be described using its input state.<br>

>>>>>>>>>>>>>> (that was the meaning of 'stepwise semantics', I should have been <br>

>>>>>>>>>>>>>> more<br>

>>>>>>>>>>>>>> clear about this)<br>

>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>> For example, the semantics of 'z = add x y' can be defined as <br>

>>>>>>>>>>>>>> follows:<br>

>>>>>>>>>>>>>> Given an input state s, next state s' = s[z -> s(x) + s(y)]<br>

>>>>>>>>>>>>>> where s(x) is the value of x in the previous state, and s[z -> v] <br>

>>>>>>>>>>>>>> is a<br>

>>>>>>>>>>>>>> state with z updated to v.<br>

>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>> Another example that involves memory: the semantics of 'i = <br>

>>>>>>>>>>>>>> ptrtoint p'<br>

>>>>>>>>>>>>> can<br>

>>>>>>>>>>>>>> be defined as follows:<br>

>>>>>>>>>>>>>> Given an input state s, next state s' = s[i -> s(p).obj.address +<br>

>>>>>>>>>>>>>> s(p).offset]<br>

>>>>>>>>>>>>>> where obj.address is the begin address of a memory object obj pointed<br>

>>>>>>>>>>> by<br>

>>>>>>>>>>>>> p<br>

>>>>>>>>>>>>>> & offset is p's byte offset. (Imagine a pointer to the offset of some<br>

>>>>>>>>>>>>> char<br>

>>>>>>>>>>>>>> array).<br>

>>>>>>>>>>>>>> Note that ptrtoint & add can be nicely defined w.r.t the input state.<br>

>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>> Now, the instruction that we're facing is 'p = alloca'.<br>

>>>>>>>>>>>>>> To describe the output state after executing 'p = alloca', the <br>

>>>>>>>>>>>>>> address<br>

>>>>>>>>>>> of<br>

>>>>>>>>>>>>>> new alloca should be determined.<br>

>>>>>>>>>>>>>> If observedness is involved, we need to know the future state <br>

>>>>>>>>>>>>>> again. :/<br>

>>>>>>>>>>>>> We<br>

>>>>>>>>>>>>>> don't know whether the alloca is going to be observed or not without<br>

>>>>>>>>>>>>> seeing<br>

>>>>>>>>>>>>>> the future.<br>

>>>>>>>>>>>>>> This is the problem of the current lifetime intrinsics as well.<br>

>>>>>>>>>>>>> No, you mix things up here.<br>

>>>>>>>>>>>>><br>

>>>>>>>>>>>>> Nobody proposed to modify the semantics of `alloca`.<br>

>>>>>>>>>>>>> `alloca` provides you with a fresh, unobserved block of<br>

>>>>>>>>>>>>> dereferenceable memory that is implicitly freed as the stack<br>

>>>>>>>>>>>>> unwinds. That is it. No context necessary.<br>

>>>>>>>>>>>>><br>

>>>>>>>>>>>>> If you want to modify the IR, you need to argue the observable<br>

>>>>>>>>>>>>> semantics which is nothing new. That this might require more than<br>

>>>>>>>>>>>>> a peephole view of the program is also not new.<br>

>>>>>>>>>>>>><br>

>>>>>>>>>>>>><br>

>>>>>>>>>>>>>> One possible approach to resolve this is adding an 'unobserved' <br>

>>>>>>>>>>>>>> flag to<br>

>>>>>>>>>>>>>> alloca instruction (similar to what was suggested by Nicolai).<br>

>>>>>>>>>>>>>> And, we can say that if alloca with 'unobserved' is used by<br>

>>>>>>>>>>>>> ptrtoint/icmp,<br>

>>>>>>>>>>>>>> it is UB.<br>

>>>>>>>>>>>>> The flag can be added, like we add other attributes. It should not<br>

>>>>>>>>>>>>> be required for any optimization we talked about though. It basically<br>

>>>>>>>>>>>>> is a way to manifest derived or given information into the IR.<br>

>>>>>>>>>>>>><br>

>>>>>>>>>>>>> Attribute deduction, as well as frontends with domain knowledge,<br>

>>>>>>>>>>>>> can add such information. The flag we discussed in phab was not even<br>

>>>>>>>>>>>>> sufficient for all the transformation examples I presented in my mail,<br>

>>>>>>>>>>>>> that is why I extended my  argument. We could still have a "noescape"<br>

>>>>>>>>>>>>> flag for allocas, but I'm not sure how useful that really is. We can<br>

>>>>>>>>>>>>> certainly deduce it and manifest it, unsure if we have domain <br>

>>>>>>>>>>>>> knowledge<br>

>>>>>>>>>>>>> we can use for non-trivial cases though.<br>

>>>>>>>>>>>>><br>

>>>>>>>>>>>>><br>

>>>>>>>>>>>>>> But this makes ptrtoint/icmp make UB-raising instructions, which<br>

>>>>>>>>>>>>>> contradicts with what LLVM does.<br>

>>>>>>>>>>>>> As with other violation of attributes I would, on first though, <br>

>>>>>>>>>>>>> suggest<br>

>>>>>>>>>>>>> to produce poison, not UB.<br>

>>>>>>>>>>>>><br>

>>>>>>>>>>>>><br>

>>>>>>>>>>>>>> Also, existing optimizations like loop versioning can introduce<br>

>>>>>>>>>>>>>> ptrtoint/pointer comparisons too.<br>

>>>>>>>>>>>>> Sure. I am not certain why that is a problem. I get the feeling things<br>

>>>>>>>>>>>>> are still mixed up here.<br>

>>>>>>>>>>>>><br>

>>>>>>>>>>>>> What I proposed is twofold:<br>

>>>>>>>>>>>>><br>

>>>>>>>>>>>>> 1) We stop folding comparisons between different allocas if <br>

>>>>>>>>>>>>> changing the<br>

>>>>>>>>>>>>> address<br>

>>>>>>>>>>>>>       of both might be observable. Thus, if both might have their <br>

>>>>>>>>>>>>> address<br>

>>>>>>>>>>>>> "taken"/escaped,<br>

>>>>>>>>>>>>>       other than the comparisons we want to fold, we cannot proceed.<br>

>>>>>>>>>>>>><br>

>>>>>>>>>>>>> 2) We define lifetime markers to mean `memset(undef)`.<br>

>>>>>>>>>>>>><br>

>>>>>>>>>>>>> The first should be sufficient for the problem you were trying to <br>

>>>>>>>>>>>>> solve<br>

>>>>>>>>>>>>> in the<br>

>>>>>>>>>>>>> first place. The second makes lifetime markers less weird. Note <br>

>>>>>>>>>>>>> that 1)<br>

>>>>>>>>>>>>> is not changing<br>

>>>>>>>>>>>>> the semantics of the IR. We basically just argue there is a bug in our<br>

>>>>>>>>>>>>> instcombine right<br>

>>>>>>>>>>>>> now as we do not check all necessary preconditions.<br>

>>>>>>>>>>>>><br>

>>>>>>>>>>>>><br>

>>>>>>>>>>>>>> I see that there are other questions that I didn't answer yet, but <br>

>>>>>>>>>>>>>> let<br>

>>>>>>>>>>> me<br>

>>>>>>>>>>>>>> answer this first to limit the length of the text :)<br>

>>>>>>>>>>>>> Sure, we can split the discussion :)<br>

>>>>>>>>>>>>><br>

>>>>>>>>>>>>> ~ Johannes<br>

>>>>>>>>>>>>><br>

>>>>>>>>>>>>><br>

>>>>>>>>>>>>>> Thanks,<br>

>>>>>>>>>>>>>> Juneyoung<br>

>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>> On Thu, Jan 7, 2021 at 3:36 AM Johannes Doerfert <<br>

>>>>>>>>>>>>> <a href="mailto:johannesdoerfert@gmail.com" target="_blank">johannesdoerfert@gmail.com</a>><br>

>>>>>>>>>>>>>> wrote:<br>

>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>> On 1/5/21 8:00 PM, Juneyoung Lee wrote:<br>

>>>>>>>>>>>>>>>> Hi Johannes,<br>

>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>> I read your proposal and thought about the model.<br>

>>>>>>>>>>>>>>> Cool, thanks!<br>

>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>> As you concerned in A3, certain programs may be valid only when<br>

>>>>>>>>>>> memory<br>

>>>>>>>>>>>>>>>> blocks with overlapping lifetimes have disjoint addresses.<br>

>>>>>>>>>>>>>>>> Look at this example (I'm using malloc, but alloca also works):<br>

>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>> p1 = malloc(4)<br>

>>>>>>>>>>>>>>>> p2 = malloc(4) // for brevity, assume that there as enough space <br>

>>>>>>>>>>>>>>>> & p1<br>

>>>>>>>>>>>>> and<br>

>>>>>>>>>>>>>>>> p2 != null<br>

>>>>>>>>>>>>>>>> set<char*> s;<br>

>>>>>>>>>>>>>>>> s.insert(p1); s.insert(p2); // If the second insert did nothing, it<br>

>>>>>>>>>>>>> would<br>

>>>>>>>>>>>>>>>> be surprise to programmers<br>

>>>>>>>>>>>>>>>> work(s);<br>

>>>>>>>>>>>>>>>> free(data1)<br>

>>>>>>>>>>>>>>>> free(data2)<br>

>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>> Clearly, IR semantics should guarantee that escaped blocks are<br>

>>>>>>>>>>>>> disjoint.<br>

>>>>>>>>>>>>>>> It<br>

>>>>>>>>>>>>>>>> would be great for verification tools on LLVM IR to be able to <br>

>>>>>>>>>>>>>>>> answer<br>

>>>>>>>>>>>>>>> that<br>

>>>>>>>>>>>>>>>> the second insert will succeed.<br>

>>>>>>>>>>>>>>> I agree, the second insert should succeed, assuming `p1 && p2`.<br>

>>>>>>>>>>>>>>> I don't think my proposal would in any way impact the program above,<br>

>>>>>>>>>>>>>>> if anything the A3 reasoning makes sure such a program with allocas<br>

>>>>>>>>>>>>>>> is not miscompiled.<br>

>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>> I'm also not sure I understand what you try to argue for. Maybe<br>

>>>>>>>>>>>>>>> elaborate a bit what it is you think is bad or needs to be changed.<br>

>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>> The problem is that definition of escapedness is not clear at the<br>

>>>>>>>>>>>>>>> semantic<br>

>>>>>>>>>>>>>>>> level. Describing the IR semantics w.r.t. LLVM's escaped analysis<br>

>>>>>>>>>>>>>>> function<br>

>>>>>>>>>>>>>>>> isn't something we want.<br>

>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>> Semantic definition of escapedness of a pointer seems hard, I mean<br>

>>>>>>>>>>> in a<br>

>>>>>>>>>>>>>>>> stepwise manner.<br>

>>>>>>>>>>>>>>>> It isn't a single instruction such as 'escape i8* ptr', and we need<br>

>>>>>>>>>>> to<br>

>>>>>>>>>>>>>>> look<br>

>>>>>>>>>>>>>>>> over all instructions in the function. For example, '(int)(p+1) -<br>

>>>>>>>>>>>>> (int)p'<br>

>>>>>>>>>>>>>>>> isn't semantically escaping the pointer p because the result is 1<br>

>>>>>>>>>>>>>>>> regardless of the value of p.<br>

>>>>>>>>>>>>>>>> Stepwisely defining the semantics of instructions is a desirable<br>

>>>>>>>>>>>>>>> direction<br>

>>>>>>>>>>>>>>>> IMO.<br>

>>>>>>>>>>>>>>> I'm confused. What in the proposal would prevent us from defining<br>

>>>>>>>>>>>>>>> the semantics of instructions, or force us to do it in an <br>

>>>>>>>>>>>>>>> "undesirable<br>

>>>>>>>>>>>>>>> way"?<br>

>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>> In practice, syntactically checking escapedness + nice engineering<br>

>>>>>>>>>>>>> might<br>

>>>>>>>>>>>>>>>> not break optimizations in most cases (as undef/poison did); but it<br>

>>>>>>>>>>>>> would<br>

>>>>>>>>>>>>>>>> be great to move to another level, since LLVM IR is used in so many<br>

>>>>>>>>>>>>>>> places<br>

>>>>>>>>>>>>>>>> :)<br>

>>>>>>>>>>>>>>> The property under which you can coalesce objects is simple:<br>

>>>>>>>>>>>>>>>       It is not observable.<br>

>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>> Now, if the address of one of the two objects you coalesce is not<br>

>>>>>>>>>>>>>>> observed, coalescing is not observable. That is a sufficient <br>

>>>>>>>>>>>>>>> condition<br>

>>>>>>>>>>>>>>> not a necessary one. Pointer "escaping" is one step further. If the<br>

>>>>>>>>>>>>>>> address doesn't escape it is not observed. This does not mean the<br>

>>>>>>>>>>>>>>> "semantic conditions for coalescing", e.g., for the purpose of<br>

>>>>>>>>>>>>> translation<br>

>>>>>>>>>>>>>>> validation, is supposed to be build on top of our "definition of<br>

>>>>>>>>>>>>> escaping<br>

>>>>>>>>>>>>>>> pointers". That said, we use "does not escape" as a precondition for<br>

>>>>>>>>>>>>>>> various transformation and I'm unsure what is any different now. The<br>

>>>>>>>>>>>>>>> entire escape argument is only used in the validity of the pointer<br>

>>>>>>>>>>>>> folding.<br>

>>>>>>>>>>>>>>> Similarly, we can fold a comparison of a noalias pointer with <br>

>>>>>>>>>>>>>>> another<br>

>>>>>>>>>>>>> one<br>

>>>>>>>>>>>>>>> if the former does not escape (and both are dereferenced and one is<br>

>>>>>>>>>>>>>>> written for sure).<br>

>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>> The pointer comparison is another beast. It indeed has a few <br>

>>>>>>>>>>>>>>>> issues,<br>

>>>>>>>>>>>>> and<br>

>>>>>>>>>>>>>>>> solving it might require nontrivial solution.<br>

>>>>>>>>>>>>>>> I think the main problem of the inconsistencies and such we've <br>

>>>>>>>>>>>>>>> seen is<br>

>>>>>>>>>>>>>>> rooted in the erroneous folding of pointer comparisons. Cleaning up<br>

>>>>>>>>>>> the<br>

>>>>>>>>>>>>>>> lifetime marker semantics is actually unrelated and simply not <br>

>>>>>>>>>>>>>>> folding<br>

>>>>>>>>>>>>>>> as described in A3 should solve the issue that has been reported.<br>

>>>>>>>>>>>>>>> Nevertheless,<br>

>>>>>>>>>>>>>>> we should take a crack at lifetime markers while we are here.<br>

>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>> ~ Johannes<br>

>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>> Thanks,<br>

>>>>>>>>>>>>>>>> Juneyoung<br>

>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>> On Tue, Jan 5, 2021 at 9:37 AM Johannes Doerfert <<br>

>>>>>>>>>>>>>>> <a href="mailto:johannesdoerfert@gmail.com" target="_blank">johannesdoerfert@gmail.com</a>><br>

>>>>>>>>>>>>>>>> wrote:<br>

>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>> Hi Juneyoung,<br>

>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>> Happy new year :)<br>

>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>> After we had a lengthy discussion on phab last year, I've tried to<br>

>>>>>>>>>>>>>>>>> summarize my thoughts,<br>

>>>>>>>>>>>>>>>>> especially given that I had some time to think about things <br>

>>>>>>>>>>>>>>>>> over the<br>

>>>>>>>>>>>>>>> break.<br>

>>>>>>>>>>>>>>>>> Still, no promises on the quality ;)<br>

>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>> I start with general questions I asked myself and then go on<br>

>>>>>>>>>>> rambling<br>

>>>>>>>>>>>>>>>>> about a potential design.<br>

>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>> Q1: Is lifetime a given property or a derived one, thus is it <br>

>>>>>>>>>>>>>>>>> fixed<br>

>>>>>>>>>>> or<br>

>>>>>>>>>>>>>>>>> modifiable?<br>

>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>> This is a question I asked myself a lot recently. I think it is<br>

>>>>>>>>>>>>> derived<br>

>>>>>>>>>>>>>>>>> and modifiable,<br>

>>>>>>>>>>>>>>>>> at least I hope it is. Only that would allow transformations I <br>

>>>>>>>>>>>>>>>>> would<br>

>>>>>>>>>>>>>>> want<br>

>>>>>>>>>>>>>>>>> us to do. Here are some examples:<br>

>>>>>>>>>>>>>>>>>        <a href="https://godbolt.org/z/G8obj3" rel="noreferrer" target="_blank">https://godbolt.org/z/G8obj3</a><br>

>>>>>>>>>>>>>>>>>        <a href="https://godbolt.org/z/obaTc" rel="noreferrer" target="_blank">https://godbolt.org/z/obaTc</a><br>

>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>> Q2: Is a pointer comparison, or similar use, extending the <br>

>>>>>>>>>>>>>>>>> lifetime?<br>

>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>> Asked differently, can we hoist a pointer comparison into a region<br>

>>>>>>>>>>>>> where<br>

>>>>>>>>>>>>>>>>> the pointer is dead?<br>

>>>>>>>>>>>>>>>>> This is an important question which we haven't discussed much <br>

>>>>>>>>>>>>>>>>> as we<br>

>>>>>>>>>>>>>>>>> assumed LICM has to work.<br>

>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>> The current behavior is that non-dereferencing uses are not<br>

>>>>>>>>>>> extending<br>

>>>>>>>>>>>>>>>>> the lifetime and are<br>

>>>>>>>>>>>>>>>>> allowed outside of "lifetime regions" (as indicated by markers).<br>

>>>>>>>>>>> They<br>

>>>>>>>>>>>>>>>>> will always produce valid<br>

>>>>>>>>>>>>>>>>> results. Though, we might want to think about a lifetime marker <br>

>>>>>>>>>>>>>>>>> that<br>

>>>>>>>>>>>>>>>>> spits out a new pointer<br>

>>>>>>>>>>>>>>>>> value instead of reusing the old one.<br>

>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>> Q3: Can we use lifetime to argue about addresses?<br>

>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>> The question here is, can we fold address comparisons based on<br>

>>>>>>>>>>>>>>>>> lifetimes, or not.<br>

>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>> So far, we fold comparisons based on "address information". For<br>

>>>>>>>>>>>>> example,<br>

>>>>>>>>>>>>>>>>> we "know" globals,<br>

>>>>>>>>>>>>>>>>> allocas, and mallocs cannot be equal to one another. Also, two<br>

>>>>>>>>>>>>> distinct<br>

>>>>>>>>>>>>>>>>> allocations, for globals<br>

>>>>>>>>>>>>>>>>> and allocas, are considered unequal. Now, the crux is that we have<br>

>>>>>>>>>>> to<br>

>>>>>>>>>>>>> be<br>

>>>>>>>>>>>>>>>>> consistent if we do two<br>

>>>>>>>>>>>>>>>>> comparisons, and, as of now, we are not (bug number missing). <br>

>>>>>>>>>>>>>>>>> Since<br>

>>>>>>>>>>>>> the<br>

>>>>>>>>>>>>>>>>> backend (or any other place<br>

>>>>>>>>>>>>>>>>> for that matter) might coalesce allocas, their addresses might not<br>

>>>>>>>>>>> be<br>

>>>>>>>>>>>>>>>>> different after all. If we<br>

>>>>>>>>>>>>>>>>> already folded a comparison to "unequal" we are doomed if we later<br>

>>>>>>>>>>>>> have<br>

>>>>>>>>>>>>>>>>> a comparison that results<br>

>>>>>>>>>>>>>>>>> in "equal". (Note, this is different from aliasing rules as <br>

>>>>>>>>>>>>>>>>> they can<br>

>>>>>>>>>>>>> be<br>

>>>>>>>>>>>>>>>>> stricter.)<br>

>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>> Design:<br>

>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>> I would hope we can come up with a model that treats memory "the<br>

>>>>>>>>>>>>> same",<br>

>>>>>>>>>>>>>>>>> regardless if it is global,<br>

>>>>>>>>>>>>>>>>> stack, or heap. I want to avoid special casing one of them wrt.<br>

>>>>>>>>>>>>> lifetime<br>

>>>>>>>>>>>>>>>>> as I believe most optimizations<br>

>>>>>>>>>>>>>>>>> would apply to any of them, potentially for different reasons and<br>

>>>>>>>>>>> with<br>

>>>>>>>>>>>>>>>>> different gains, but nevertheless.<br>

>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>> Proposal (largely based on the conversation in phab):<br>

>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>> A1: Lifetime is a concept that talks about memory content *only*.<br>

>>>>>>>>>>>>>>>>> Basically, the memory content is set to<br>

>>>>>>>>>>>>>>>>>          undefined by lifetime markers. It is <br>

>>>>>>>>>>>>>>>>> derived/modifiable as<br>

>>>>>>>>>>> it<br>

>>>>>>>>>>>>> is<br>

>>>>>>>>>>>>>>>>> just an "as-is" property of the memory<br>

>>>>>>>>>>>>>>>>>          content. The lifetimes of an object, as described by<br>

>>>>>>>>>>> markers,<br>

>>>>>>>>>>>>>>> might<br>

>>>>>>>>>>>>>>>>> change during the compilation. They<br>

>>>>>>>>>>>>>>>>>          might become smaller if we deduce the object is not <br>

>>>>>>>>>>>>>>>>> accessed<br>

>>>>>>>>>>>>> and<br>

>>>>>>>>>>>>>>>>> the memory content is not used, they<br>

>>>>>>>>>>>>>>>>>          might become larger if objects with non-overlapping<br>

>>>>>>>>>>> lifetimes<br>

>>>>>>>>>>>>> are<br>

>>>>>>>>>>>>>>>>> coalesced. (One could see the latter as<br>

>>>>>>>>>>>>>>>>>          introducing a new object though.)<br>

>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>> A2: If we define lifetime as above, it has nothing to do with the<br>

>>>>>>>>>>>>>>>>> address of an object. Consequently, pointer<br>

>>>>>>>>>>>>>>>>>          comparisons and similar operations are valid outside the<br>

>>>>>>>>>>>>> lifetime.<br>

>>>>>>>>>>>>>>>>> Loads and stores are as well, they can<br>

>>>>>>>>>>>>>>>>>          even not be removed "easily". A store followed by a <br>

>>>>>>>>>>>>>>>>> lifetime<br>

>>>>>>>>>>>>>>> marker<br>

>>>>>>>>>>>>>>>>> or a load following a lifetime marker<br>

>>>>>>>>>>>>>>>>>          is dead or results in undef respectively.<br>

>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>> A3: We could not use lifetime to argue about addresses. This means<br>

>>>>>>>>>>> we<br>

>>>>>>>>>>>>>>>>> could/should also not argue that overlapping<br>

>>>>>>>>>>>>>>>>>          lifetimes result in different addresses. Thus, a <br>

>>>>>>>>>>>>>>>>> comparison<br>

>>>>>>>>>>>>>>> between<br>

>>>>>>>>>>>>>>>>> the address of two allocas could not<br>

>>>>>>>>>>>>>>>>>          immediately be folded. That said, there would be special<br>

>>>>>>>>>>> cases<br>

>>>>>>>>>>>>>>>>> though. Basically, if one of the allocas does<br>

>>>>>>>>>>>>>>>>>          not escape, other than the comparisons to be folded, <br>

>>>>>>>>>>>>>>>>> we can<br>

>>>>>>>>>>>>> fold<br>

>>>>>>>>>>>>>>>>> them. Afterwards, coalescing or splitting<br>

>>>>>>>>>>>>>>>>>          would still be consistent because it is unobservable. The<br>

>>>>>>>>>>>>> problem<br>

>>>>>>>>>>>>>>>>> we have in-tree is that we fold even though<br>

>>>>>>>>>>>>>>>>>          the address is still observed (after the fold). It is <br>

>>>>>>>>>>>>>>>>> still<br>

>>>>>>>>>>>>>>> unclear<br>

>>>>>>>>>>>>>>>>> to me what the impact of this would be<br>

>>>>>>>>>>>>>>>>>          on real code. I suggested before that we run some<br>

>>>>>>>>>>> experiments<br>

>>>>>>>>>>>>>>> first<br>

>>>>>>>>>>>>>>>>> before we make any decision whatsoever.<br>

>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>> This is pretty much saying that lifetime markers are<br>

>>>>>>>>>>> `memset(undef)`,<br>

>>>>>>>>>>>>> as<br>

>>>>>>>>>>>>>>>>> you suggested before (I think).<br>

>>>>>>>>>>>>>>>>> There are some implementation level differences but at the end of<br>

>>>>>>>>>>> the<br>

>>>>>>>>>>>>>>>>> day they are basically the same.<br>

>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>> Happy to hear some thoughts on this, especially if it fixes the<br>

>>>>>>>>>>>>> problems<br>

>>>>>>>>>>>>>>>>> that lead to D93376 in the first place.<br>

>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>> ~ Johannes<br>

>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>> On 12/18/20 2:42 AM, Juneyoung Lee via llvm-dev wrote:<br>

>>>>>>>>>>>>>>>>>> Hello all,<br>

>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>> We're discussing the well-formedness of lifetime.start/end<br>

>>>>>>>>>>> intrinsic<br>

>>>>>>>>>>>>>>>>> here (<br>

>>>>>>>>>>>>>>>>>> <a href="https://reviews.llvm.org/D93376" rel="noreferrer" target="_blank">https://reviews.llvm.org/D93376</a>), deciding what is a<br>

>>>>>>>>>>> (syntactically<br>

>>>>>>>>>>>>> &<br>

>>>>>>>>>>>>>>>>>> semantically) valid use of these intrinsics.<br>

>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>> I'd like to gather more context about the intrinsics.<br>

>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>> 1. Is there a frontend or framework that introduces lifetime call<br>

>>>>>>>>>>>>> with<br>

>>>>>>>>>>>>>>>>>> non-stack allocated objects?<br>

>>>>>>>>>>>>>>>>>> If exists, which behavior do they expect from it?<br>

>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>> 2. Can a block's lifetime be started bytewise or elementwise?<br>

>>>>>>>>>>>>>>>>>> I imagine an optimization that allow using stack very compactly,<br>

>>>>>>>>>>> but<br>

>>>>>>>>>>>>>>>>> wonder<br>

>>>>>>>>>>>>>>>>>> there is a real-world use case.<br>

>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>> Thanks,<br>

>>>>>>>>>>>>>>>>>> Juneyoung<br>

>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>>>>> _______________________________________________<br>

>>>>>>>>>>>>>>>>>> LLVM Developers mailing list<br>

>>>>>>>>>>>>>>>>>> <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

>>>>>>>>>>>>>>>>>> <a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

>>>>>>>>>><br>

>>>>>>>><br>

>>>>>><br>

>>>><br>

>><br>

<br>

-- <br>

Website: <a href="https://people.mpi-sws.org/~jung/" rel="noreferrer" target="_blank">https://people.mpi-sws.org/~jung/</a><br>

</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><br></div><font size="1">Juneyoung Lee</font><div><font size="1">Software Foundation Lab, Seoul National University</font></div></div></div>