[llvm-dev] lifetime.start/end

Tue Jan 26 06:49:27 PST 2021

On 1/26/21 8:42 AM, Juneyoung Lee wrote:
> I want to avoid allowing 'pointer zapping' too; LLVM IR has a static single
> assignment form, and I think it might not be a good idea to add a special
> rule that violates this.
> Well, there might be some semantics that would explain optimizations that
> we can imagine, but it introduces more complexity to the language
> semantics by adding a special case.
>
> BTW, is there any reason why allocas are being placed at the entry block?
> If alloca was simply placed at the location where lifetime.start is
> currently being placed, we don't need to struggle with this complication
> about the semantics of lifetime.start..?

I always assumed so we can find them easily and apply SROA and
because the ones that remain are then translated to a stack increase
at the function entry, so no push/pop of values in the middle of
a function (most of the time). I have not confirmed this though.

FWIW, most of the allocas, for C/C++, are removed early and that
is always the "best" outcome (with notable exceptions).

> Thanks,
> Juneyoung
>
> On Tue, Jan 26, 2021 at 7:22 PM Ralf Jung <jung at mpi-sws.org> wrote:
>
>> Hi Johannes,
>>
>>> On 1/24/21 10:07 AM, Ralf Jung wrote:
>>>> Hi Johannes,
>>>>
>>>>> At some point I lost track of what you even want to achieve.
>>>> *lol* fair.^^
>>>> All I want is a precise and meaningful spec for lifetime.start and
>>>> lifetime.end. That's where we entered this rabbit hole. You didn't like
>>>> Juneyoung's and Nuno's proposal for this and instead suggested to
>> replace the
>>>> entire system by something else, and that new system is what we are
>> discussing
>>>> now (that is my understanding, anyway).
>>>>
>>> I went back to the proposal in question, https://reviews.llvm.org/D93376,
>> and I
>>> could not right away determine why this would avoid the inconsistency we
>> can
>>> imagine for something like
>>>
>>> ```
>>> int *G = 0;
>>> void bar(int *p) __noinline__ { if (G) print(G == p); else G = p;
>>> void foo() {
>>>     intptr_t xp, yp;
>>>     { int x; xp = &x; bar(&x); }
>>>     { int y; yp = &y; bar(&y); }
>>>     print(xp == yp);               // One can imagine this is folded to
>> false and
>>> because x and y are coalesced bar will print true.
>>> }
>>>
>>> ```
>>>
>>> I was opposing the proposal because of the brittle syntactic
>> restrictions. In my
>>> attempt to justify not having those I also tried to resolve the above
>>> inconsistency in a more general manner because I thought that was what
>> it was
>>> all about. I believe that we either have to argue the address of a dead
>> object
>>> is unusable/undetermined, basically the direction of James comment, or
>> we need
>>> to decouple lifetime markers from locations.
>> Folding to false and coalescing are inconsistent with each other, yes. It
>> is
>> just wrong to perform both of these on the above program, I would say (if
>> we
>> interpret it with LLVM semantics, which I assume has no "pointer
>> zapping"). Is
>> that up for debate?
>>
>> However, I do not see the immediate connection to lifetime markers -- the
>> same
>> would be true for a program using malloc/free. Coalescing is permissible
>> whenever lifetimes are disjoint. Folding to false is only permissible if
>> the
>> pointers are *definitely not equal*, e.g. if their lifetimes overlap.
>>
>>>> Sure, everything in the semantics is up to what the program actually
>> observes.
>>>> But I was considering the general case, in which it will not be
>> possible to
>>>> prove that some address is not observed. I expect that to be quite
>> common
>>>> (certainly, Rust programs really like to take references to local
>> variables
>>>> and pass them to other functions), and in those cases, not having
>>>> lifetime.start/end will severely restrict program transformations -- as
>> you
>>>> observed, even outlining is affected.
>>>>
>>>> This is basically the abstract version of the concrete concern that
>> James raised.
>>> I get the argument that it can hinder optimizations but it is abstract.
>> It seems
>>> pretty simple to test this out, I mean to restrict the folding and the
>>> coalescing. The benefit, which I think got ignored here a lot, is that
>> it
>>> provides a clear path towards deduction of coalescing opportunities,
>> that is,
>>> coalescing in the middle-end or backend w/o lifetime markers. Though,
>> similar to
>>> the impact of my proposal, I have no idea if that is even needed.
>> It is certainly not simple for me to test out anything that would involve
>> changing LLVM. ;)
>> The syntactic condition also provides a clear path to evaluate whether any
>> given
>> folding or coalescing is correct. Any sufficiently precise semantics will
>> do
>> that. The question is just which semantics y'all want. If the semantics
>> require
>> "pointer zapping" to remain sufficiently optimizable, that sounds like a
>> serious
>> drawback to me.
>>
>> Kind regards,
>> Ralf
>>
>>> ~ Johannes
>>>
>>>
>>>> Kind regards,
>>>> Ralf
>>>>
>>>>> A lot of my arguments around alloca/malloc/... is that we can augment
>> the
>>>>> information from the frontend by determining if an address is observed
>> or
>>>>> not. If it is not, we can limit its allocation, coalesce it, etc.
>> Lifetime
>>>>> markers can help us do this but they are not the only way.
>>>>> So, basically, we do not need lifetime markers to coalesce
>> alloca/malloc/...
>>>>> but that they can help.
>>>>
>>>>
>>>>> Let's try to get out of this rabbit hole. If you have a problem with
>> the
>>>>> proposal as outlined in
>>>>> https://lists.llvm.org/pipermail/llvm-dev/2021-January/147594.html ,
>> please
>>>>> let me know. I assume dealing with observed addresses different than
>> with
>>>>> non-observede ones (implied by noescape) is the way forward now.
>>>>>
>>>>> ~ Johannes
>>>>>
>>>>>
>>>>>
>>>>>> Kind regards,
>>>>>> Ralf
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>> Kind regards,
>>>>>>>> Ralf
>>>>>>>>
>>>>>>>>> Does that make some sense?
>>>>>>>>>
>>>>>>>>> ~ Johannes
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Kind regards,
>>>>>>>>>> Ralf
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 1/7/21 10:12 PM, Juneyoung Lee wrote:
>>>>>>>>>>>> On Fri, Jan 8, 2021 at 8:54 AM Johannes Doerfert
>>>>>>>>>>>> <johannesdoerfert at gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> On 1/7/21 4:55 PM, Juneyoung Lee wrote:
>>>>>>>>>>>>>> It is, alloca must have its integral address decided at its
>>>>>>>>>>>>>> *allocation*
>>>>>>>>>>>>>> time as well, like 0xFF00... .
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> For example, if 0xFFFF0000 ~ 0xFF000000 is already allocated,
>> the new
>>>>>>>>>>>>>> alloca cannot be placed there unless it is not observed in
>> the future,
>>>>>>>>>>>>>> according to your proposal.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> But how do we know from the current state whether the stack
>> variable is
>>>>>>>>>>>>>> going to be observed or not?
>>>>>>>>>>>>> With that argument you could allocate all but 4 bytes, do a
>> malloc(4)
>>>>>>>>>>>>> and know the address of the returned pointer (assuming it is
>> not null).
>>>>>>>>>>>>> What I try to say is, either your scenario is part of the
>> model and
>>>>>>>>>>>>> everything we do is broken as you could "observe" addresses
>> passively,
>>>>>>>>>>>>> *or*, the abstract machine we use for semantic reasoning
>> doesn't permit
>>>>>>>>>>>>> the above reasoning. I really hope for the latter.
>>>>>>>>>>>>>
>>>>>>>>>>>> That's a very valid point..!
>>>>>>>>>>>>
>>>>>>>>>>>> Actually, I have a memory model that addresses this problem.
>>>>>>>>>>>> The gist is that we can interpret each allocation instruction
>> as
>>>>>>>>>>>> *creating
>>>>>>>>>>>> 2 blocks* and nondeterministically returning one of them.
>>>>>>>>>>>> The other one is marked as invalid but not freed immediately.
>>>>>>>>>>>> Deallocation
>>>>>>>>>>>> frees both blocks.
>>>>>>>>>>>> If there is not enough space for having two blocks, it is
>> treated as
>>>>>>>>>>>> out-of-memory.
>>>>>>>>>>>>
>>>>>>>>>>>> It makes guessing the address of an allocation according to
>> memory layout
>>>>>>>>>>>> invalid UB.
>>>>>>>>>>>>
>>>>>>>>>>>> p = malloc(4)
>>>>>>>>>>>> // If p != null, it is guaranteed there was at least two 4 byte
>> slots
>>>>>>>>>>>> available, say 0x100~0x104 and 0x110~0x114.
>>>>>>>>>>>> // Two blocks are allocated at 0x110 and 0x114, and one of the
>>>>>>>>>>>> addresses is
>>>>>>>>>>>> nondeterministically returned.
>>>>>>>>>>>> // By the nature of nondeterminism, all executions below should
>> be
>>>>>>>>>>>> well-defined regardless of the address of p.
>>>>>>>>>>>> *(int*)0x100 = 10; // In an execution where p is 0x110, this
>> raises UB.
>>>>>>>>>>>> The paper link is here <https://dl.acm.org/doi/10.1145/3276495>
>> FYI.
>>>>>>>>>>> Nice! Thanks for the link :)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> To argue differently: Who is to say there is a stack, or only
>> one,
>>>>>>>>>>>>> or that alloca allocates memory "on the one stack"? That is
>> not part of
>>>>>>>>>>>>> the IR, IMHO. I can write a machine on which alloca lowers to
>> malloc,
>>>>>>>>>>>>> I don't even need the free during stack unwind but that I
>> could do as
>>>>>>>>>>>>> well if I wanted to.
>>>>>>>>>>>> This is right, the example was for illustrative purposes. IR
>> does not
>>>>>>>>>>>> enforce alloca to be placed at 'stack'.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>> Making icmp/ptrtoint yield poison will still make loop
>> versioning or
>>>>>>>>>>>>>> pointer rewriting transformations unsound because these
>> operations now
>>>>>>>>>>>>> can
>>>>>>>>>>>>>> create poison (even if pointers are noundef).
>>>>>>>>>>>>> I did not say they yield poison, at least I did not try to say
>> that.
>>>>>>>>>>>>> What are you referring to exactly?
>>>>>>>>>>>>>
>>>>>>>>>>>> I was referring to this:
>>>>>>>>>>>>
>>>>>>>>>>>>>> But this makes ptrtoint/icmp make UB-raising instructions,
>> which
>>>>>>>>>>>>>> contradicts with what LLVM does.
>>>>>>>>>>>>> As with other violation of attributes I would, on first
>> though, suggest
>>>>>>>>>>>>> to produce poison, not UB.
>>>>>>>>>>>> But it is more about the (imaginary) attribute, so maybe I was
>>>>>>>>>>>> slightly out
>>>>>>>>>>>> of topic.
>>>>>>>>>>>>
>>>>>>>>>>>>> 2) is fine, I think the suggestion semantically makes sense
>>>>>>>>>>>>> perfectly. 1)
>>>>>>>>>>>>>> is something I'm concerned about now.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> There are more than pointer foldings, such as rewriting such
>>>>>>>>>>>>>> expression,
>>>>>>>>>>>>>> code motion ptr cmp, introduce ptr cmp, etc. There is also
>> analysis
>>>>>>>>>>>>> relying
>>>>>>>>>>>>>> on ptr cmp.
>>>>>>>>>>>>>> Defining the correctness of each of them is something we want
>> to avoid,
>>>>>>>>>>>>> and
>>>>>>>>>>>>>> maybe that's why we want to define precise semantics for
>> things.
>>>>>>>>>>>>> I don't get the point. My proposal does not change the
>> semantics of
>>>>>>>>>>>>> pointer comparisons, at all. I explicitly mentioned that in
>> the last
>>>>>>>>>>>>> email.
>>>>>>>>>>>>>
>>>>>>>>>>>> Oh okay, I thought it was a part of the lifetime proposal, but
>> it seems
>>>>>>>>>>>> more like a separate thing.
>>>>>>>>>>>> I agree that this requires performance impact.
>>>>>>>>>>>> Also investigation of existing transformations would be needed;
>> Alive2's
>>>>>>>>>>>> pointer comparison is doing approximation yet, but if it
>> becomes fully
>>>>>>>>>>>> precise, it will show something from running LLVM unit tests I
>>>>>>>>>>>> believe..! :)
>>>>>>>>>>>>
>>>>>>>>>>>>> I think this will be less aggressive and may give nice
>> feedback to
>>>>>>>>>>>>>> potential projects that are using lifetime with non-alloca.
>>>>>>>>>>>>> The lifetime marker debate, basically 2) above, is orthogonal
>> to the
>>>>>>>>>>>>> problem
>>>>>>>>>>>>> you try to solve. It got mixed in as lifetime markers were
>> used by
>>>>>>>>>>>>> StackColoring
>>>>>>>>>>>>> to perform coalescing but that is coincidental. You can
>> (arguably)
>>>>>>>>>>>>> coalesce stack
>>>>>>>>>>>>> allocations regardless of lifetime markers and with 1) such a
>>>>>>>>>>>>> transformation
>>>>>>>>>>>>> (w/ and w/o lifetime markers) would actually be sound.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> At the end, we can implement IR writer that lowers lifetime
>> with
>>>>>>>>>>>>> non-alloca
>>>>>>>>>>>>>> into memset(undef). WDYT?
>>>>>>>>>>>>> Yeah, 2) is orthogonal and we can lower it that way. Unsure if
>> it is
>>>>>>>>>>>>> helpful
>>>>>>>>>>>>> but we can certainly define it that way in the LangRef.
>>>>>>>>>>>>>
>>>>>>>>>>>> Okay, thanks!
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> ~ Johannes
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Juneyoung
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> p.s. The reply was late, sorry. I think I can spend more time
>> on this
>>>>>>>>>>>>> today.
>>>>>>>>>>>>>> On Thu, Jan 7, 2021 at 9:02 AM Johannes Doerfert <
>>>>>>>>>>>>> johannesdoerfert at gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 1/6/21 4:33 PM, Juneyoung Lee wrote:
>>>>>>>>>>>>>>>>>> Stepwisely defining the semantics of instructions is a
>> desirable
>>>>>>>>>>>>>>>> direction
>>>>>>>>>>>>>>>>>> IMO.
>>>>>>>>>>>>>>>>> I'm confused. What in the proposal would prevent us from
>> defining
>>>>>>>>>>>>>>>>> the semantics of instructions, or force us to do it in an
>>>>>>>>>>>>>>>>> "undesirable
>>>>>>>>>>>>>>>> way"?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I meant it would be great if the output state after
>> executing an
>>>>>>>>>>>>>>>> instruction can be described using its input state.
>>>>>>>>>>>>>>>> (that was the meaning of 'stepwise semantics', I should
>> have been
>>>>>>>>>>>>>>>> more
>>>>>>>>>>>>>>>> clear about this)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> For example, the semantics of 'z = add x y' can be defined
>> as
>>>>>>>>>>>>>>>> follows:
>>>>>>>>>>>>>>>> Given an input state s, next state s' = s[z -> s(x) + s(y)]
>>>>>>>>>>>>>>>> where s(x) is the value of x in the previous state, and s[z
>> -> v]
>>>>>>>>>>>>>>>> is a
>>>>>>>>>>>>>>>> state with z updated to v.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Another example that involves memory: the semantics of 'i =
>>>>>>>>>>>>>>>> ptrtoint p'
>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>> be defined as follows:
>>>>>>>>>>>>>>>> Given an input state s, next state s' = s[i ->
>> s(p).obj.address +
>>>>>>>>>>>>>>>> s(p).offset]
>>>>>>>>>>>>>>>> where obj.address is the begin address of a memory object
>> obj pointed
>>>>>>>>>>>>> by
>>>>>>>>>>>>>>> p
>>>>>>>>>>>>>>>> & offset is p's byte offset. (Imagine a pointer to the
>> offset of some
>>>>>>>>>>>>>>> char
>>>>>>>>>>>>>>>> array).
>>>>>>>>>>>>>>>> Note that ptrtoint & add can be nicely defined w.r.t the
>> input state.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Now, the instruction that we're facing is 'p = alloca'.
>>>>>>>>>>>>>>>> To describe the output state after executing 'p = alloca',
>> the
>>>>>>>>>>>>>>>> address
>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>> new alloca should be determined.
>>>>>>>>>>>>>>>> If observedness is involved, we need to know the future
>> state
>>>>>>>>>>>>>>>> again. :/
>>>>>>>>>>>>>>> We
>>>>>>>>>>>>>>>> don't know whether the alloca is going to be observed or
>> not without
>>>>>>>>>>>>>>> seeing
>>>>>>>>>>>>>>>> the future.
>>>>>>>>>>>>>>>> This is the problem of the current lifetime intrinsics as
>> well.
>>>>>>>>>>>>>>> No, you mix things up here.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Nobody proposed to modify the semantics of `alloca`.
>>>>>>>>>>>>>>> `alloca` provides you with a fresh, unobserved block of
>>>>>>>>>>>>>>> dereferenceable memory that is implicitly freed as the stack
>>>>>>>>>>>>>>> unwinds. That is it. No context necessary.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> If you want to modify the IR, you need to argue the
>> observable
>>>>>>>>>>>>>>> semantics which is nothing new. That this might require more
>> than
>>>>>>>>>>>>>>> a peephole view of the program is also not new.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> One possible approach to resolve this is adding an
>> 'unobserved'
>>>>>>>>>>>>>>>> flag to
>>>>>>>>>>>>>>>> alloca instruction (similar to what was suggested by
>> Nicolai).
>>>>>>>>>>>>>>>> And, we can say that if alloca with 'unobserved' is used by
>>>>>>>>>>>>>>> ptrtoint/icmp,
>>>>>>>>>>>>>>>> it is UB.
>>>>>>>>>>>>>>> The flag can be added, like we add other attributes. It
>> should not
>>>>>>>>>>>>>>> be required for any optimization we talked about though. It
>> basically
>>>>>>>>>>>>>>> is a way to manifest derived or given information into the
>> IR.
>>>>>>>>>>>>>>> Attribute deduction, as well as frontends with domain
>> knowledge,
>>>>>>>>>>>>>>> can add such information. The flag we discussed in phab was
>> not even
>>>>>>>>>>>>>>> sufficient for all the transformation examples I presented
>> in my mail,
>>>>>>>>>>>>>>> that is why I extended my  argument. We could still have a
>> "noescape"
>>>>>>>>>>>>>>> flag for allocas, but I'm not sure how useful that really
>> is. We can
>>>>>>>>>>>>>>> certainly deduce it and manifest it, unsure if we have
>> domain
>>>>>>>>>>>>>>> knowledge
>>>>>>>>>>>>>>> we can use for non-trivial cases though.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> But this makes ptrtoint/icmp make UB-raising instructions,
>> which
>>>>>>>>>>>>>>>> contradicts with what LLVM does.
>>>>>>>>>>>>>>> As with other violation of attributes I would, on first
>> though,
>>>>>>>>>>>>>>> suggest
>>>>>>>>>>>>>>> to produce poison, not UB.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Also, existing optimizations like loop versioning can
>> introduce
>>>>>>>>>>>>>>>> ptrtoint/pointer comparisons too.
>>>>>>>>>>>>>>> Sure. I am not certain why that is a problem. I get the
>> feeling things
>>>>>>>>>>>>>>> are still mixed up here.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> What I proposed is twofold:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1) We stop folding comparisons between different allocas if
>>>>>>>>>>>>>>> changing the
>>>>>>>>>>>>>>> address
>>>>>>>>>>>>>>>        of both might be observable. Thus, if both might have
>> their
>>>>>>>>>>>>>>> address
>>>>>>>>>>>>>>> "taken"/escaped,
>>>>>>>>>>>>>>>        other than the comparisons we want to fold, we cannot
>> proceed.
>>>>>>>>>>>>>>> 2) We define lifetime markers to mean `memset(undef)`.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The first should be sufficient for the problem you were
>> trying to
>>>>>>>>>>>>>>> solve
>>>>>>>>>>>>>>> in the
>>>>>>>>>>>>>>> first place. The second makes lifetime markers less weird.
>> Note
>>>>>>>>>>>>>>> that 1)
>>>>>>>>>>>>>>> is not changing
>>>>>>>>>>>>>>> the semantics of the IR. We basically just argue there is a
>> bug in our
>>>>>>>>>>>>>>> instcombine right
>>>>>>>>>>>>>>> now as we do not check all necessary preconditions.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I see that there are other questions that I didn't answer
>> yet, but
>>>>>>>>>>>>>>>> let
>>>>>>>>>>>>> me
>>>>>>>>>>>>>>>> answer this first to limit the length of the text :)
>>>>>>>>>>>>>>> Sure, we can split the discussion :)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ~ Johannes
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Juneyoung
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, Jan 7, 2021 at 3:36 AM Johannes Doerfert <
>>>>>>>>>>>>>>> johannesdoerfert at gmail.com>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 1/5/21 8:00 PM, Juneyoung Lee wrote:
>>>>>>>>>>>>>>>>>> Hi Johannes,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I read your proposal and thought about the model.
>>>>>>>>>>>>>>>>> Cool, thanks!
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> As you concerned in A3, certain programs may be valid
>> only when
>>>>>>>>>>>>> memory
>>>>>>>>>>>>>>>>>> blocks with overlapping lifetimes have disjoint addresses.
>>>>>>>>>>>>>>>>>> Look at this example (I'm using malloc, but alloca also
>> works):
>>>>>>>>>>>>>>>>>> p1 = malloc(4)
>>>>>>>>>>>>>>>>>> p2 = malloc(4) // for brevity, assume that there as
>> enough space
>>>>>>>>>>>>>>>>>> & p1
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>> p2 != null
>>>>>>>>>>>>>>>>>> set<char*> s;
>>>>>>>>>>>>>>>>>> s.insert(p1); s.insert(p2); // If the second insert did
>> nothing, it
>>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>>>> be surprise to programmers
>>>>>>>>>>>>>>>>>> work(s);
>>>>>>>>>>>>>>>>>> free(data1)
>>>>>>>>>>>>>>>>>> free(data2)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Clearly, IR semantics should guarantee that escaped
>> blocks are
>>>>>>>>>>>>>>> disjoint.
>>>>>>>>>>>>>>>>> It
>>>>>>>>>>>>>>>>>> would be great for verification tools on LLVM IR to be
>> able to
>>>>>>>>>>>>>>>>>> answer
>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>> the second insert will succeed.
>>>>>>>>>>>>>>>>> I agree, the second insert should succeed, assuming `p1 &&
>> p2`.
>>>>>>>>>>>>>>>>> I don't think my proposal would in any way impact the
>> program above,
>>>>>>>>>>>>>>>>> if anything the A3 reasoning makes sure such a program
>> with allocas
>>>>>>>>>>>>>>>>> is not miscompiled.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I'm also not sure I understand what you try to argue for.
>> Maybe
>>>>>>>>>>>>>>>>> elaborate a bit what it is you think is bad or needs to be
>> changed.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The problem is that definition of escapedness is not
>> clear at the
>>>>>>>>>>>>>>>>> semantic
>>>>>>>>>>>>>>>>>> level. Describing the IR semantics w.r.t. LLVM's escaped
>> analysis
>>>>>>>>>>>>>>>>> function
>>>>>>>>>>>>>>>>>> isn't something we want.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Semantic definition of escapedness of a pointer seems
>> hard, I mean
>>>>>>>>>>>>> in a
>>>>>>>>>>>>>>>>>> stepwise manner.
>>>>>>>>>>>>>>>>>> It isn't a single instruction such as 'escape i8* ptr',
>> and we need
>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> look
>>>>>>>>>>>>>>>>>> over all instructions in the function. For example,
>> '(int)(p+1) -
>>>>>>>>>>>>>>> (int)p'
>>>>>>>>>>>>>>>>>> isn't semantically escaping the pointer p because the
>> result is 1
>>>>>>>>>>>>>>>>>> regardless of the value of p.
>>>>>>>>>>>>>>>>>> Stepwisely defining the semantics of instructions is a
>> desirable
>>>>>>>>>>>>>>>>> direction
>>>>>>>>>>>>>>>>>> IMO.
>>>>>>>>>>>>>>>>> I'm confused. What in the proposal would prevent us from
>> defining
>>>>>>>>>>>>>>>>> the semantics of instructions, or force us to do it in an
>>>>>>>>>>>>>>>>> "undesirable
>>>>>>>>>>>>>>>>> way"?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> In practice, syntactically checking escapedness + nice
>> engineering
>>>>>>>>>>>>>>> might
>>>>>>>>>>>>>>>>>> not break optimizations in most cases (as undef/poison
>> did); but it
>>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>>>> be great to move to another level, since LLVM IR is used
>> in so many
>>>>>>>>>>>>>>>>> places
>>>>>>>>>>>>>>>>>> :)
>>>>>>>>>>>>>>>>> The property under which you can coalesce objects is
>> simple:
>>>>>>>>>>>>>>>>>        It is not observable.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Now, if the address of one of the two objects you coalesce
>> is not
>>>>>>>>>>>>>>>>> observed, coalescing is not observable. That is a
>> sufficient
>>>>>>>>>>>>>>>>> condition
>>>>>>>>>>>>>>>>> not a necessary one. Pointer "escaping" is one step
>> further. If the
>>>>>>>>>>>>>>>>> address doesn't escape it is not observed. This does not
>> mean the
>>>>>>>>>>>>>>>>> "semantic conditions for coalescing", e.g., for the
>> purpose of
>>>>>>>>>>>>>>> translation
>>>>>>>>>>>>>>>>> validation, is supposed to be build on top of our
>> "definition of
>>>>>>>>>>>>>>> escaping
>>>>>>>>>>>>>>>>> pointers". That said, we use "does not escape" as a
>> precondition for
>>>>>>>>>>>>>>>>> various transformation and I'm unsure what is any
>> different now. The
>>>>>>>>>>>>>>>>> entire escape argument is only used in the validity of the
>> pointer
>>>>>>>>>>>>>>> folding.
>>>>>>>>>>>>>>>>> Similarly, we can fold a comparison of a noalias pointer
>> with
>>>>>>>>>>>>>>>>> another
>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>>>> if the former does not escape (and both are dereferenced
>> and one is
>>>>>>>>>>>>>>>>> written for sure).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The pointer comparison is another beast. It indeed has a
>> few
>>>>>>>>>>>>>>>>>> issues,
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>> solving it might require nontrivial solution.
>>>>>>>>>>>>>>>>> I think the main problem of the inconsistencies and such
>> we've
>>>>>>>>>>>>>>>>> seen is
>>>>>>>>>>>>>>>>> rooted in the erroneous folding of pointer comparisons.
>> Cleaning up
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> lifetime marker semantics is actually unrelated and simply
>> not
>>>>>>>>>>>>>>>>> folding
>>>>>>>>>>>>>>>>> as described in A3 should solve the issue that has been
>> reported.
>>>>>>>>>>>>>>>>> Nevertheless,
>>>>>>>>>>>>>>>>> we should take a crack at lifetime markers while we are
>> here.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ~ Johannes
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> Juneyoung
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Tue, Jan 5, 2021 at 9:37 AM Johannes Doerfert <
>>>>>>>>>>>>>>>>> johannesdoerfert at gmail.com>
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi Juneyoung,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Happy new year :)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> After we had a lengthy discussion on phab last year,
>> I've tried to
>>>>>>>>>>>>>>>>>>> summarize my thoughts,
>>>>>>>>>>>>>>>>>>> especially given that I had some time to think about
>> things
>>>>>>>>>>>>>>>>>>> over the
>>>>>>>>>>>>>>>>> break.
>>>>>>>>>>>>>>>>>>> Still, no promises on the quality ;)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I start with general questions I asked myself and then
>> go on
>>>>>>>>>>>>> rambling
>>>>>>>>>>>>>>>>>>> about a potential design.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Q1: Is lifetime a given property or a derived one, thus
>> is it
>>>>>>>>>>>>>>>>>>> fixed
>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>>>> modifiable?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> This is a question I asked myself a lot recently. I
>> think it is
>>>>>>>>>>>>>>> derived
>>>>>>>>>>>>>>>>>>> and modifiable,
>>>>>>>>>>>>>>>>>>> at least I hope it is. Only that would allow
>> transformations I
>>>>>>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>>> want
>>>>>>>>>>>>>>>>>>> us to do. Here are some examples:
>>>>>>>>>>>>>>>>>>>         https://godbolt.org/z/G8obj3
>>>>>>>>>>>>>>>>>>>         https://godbolt.org/z/obaTc
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Q2: Is a pointer comparison, or similar use, extending
>> the
>>>>>>>>>>>>>>>>>>> lifetime?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Asked differently, can we hoist a pointer comparison
>> into a region
>>>>>>>>>>>>>>> where
>>>>>>>>>>>>>>>>>>> the pointer is dead?
>>>>>>>>>>>>>>>>>>> This is an important question which we haven't discussed
>> much
>>>>>>>>>>>>>>>>>>> as we
>>>>>>>>>>>>>>>>>>> assumed LICM has to work.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The current behavior is that non-dereferencing uses are
>> not
>>>>>>>>>>>>> extending
>>>>>>>>>>>>>>>>>>> the lifetime and are
>>>>>>>>>>>>>>>>>>> allowed outside of "lifetime regions" (as indicated by
>> markers).
>>>>>>>>>>>>> They
>>>>>>>>>>>>>>>>>>> will always produce valid
>>>>>>>>>>>>>>>>>>> results. Though, we might want to think about a lifetime
>> marker
>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>> spits out a new pointer
>>>>>>>>>>>>>>>>>>> value instead of reusing the old one.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Q3: Can we use lifetime to argue about addresses?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The question here is, can we fold address comparisons
>> based on
>>>>>>>>>>>>>>>>>>> lifetimes, or not.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> So far, we fold comparisons based on "address
>> information". For
>>>>>>>>>>>>>>> example,
>>>>>>>>>>>>>>>>>>> we "know" globals,
>>>>>>>>>>>>>>>>>>> allocas, and mallocs cannot be equal to one another.
>> Also, two
>>>>>>>>>>>>>>> distinct
>>>>>>>>>>>>>>>>>>> allocations, for globals
>>>>>>>>>>>>>>>>>>> and allocas, are considered unequal. Now, the crux is
>> that we have
>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>> consistent if we do two
>>>>>>>>>>>>>>>>>>> comparisons, and, as of now, we are not (bug number
>> missing).
>>>>>>>>>>>>>>>>>>> Since
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> backend (or any other place
>>>>>>>>>>>>>>>>>>> for that matter) might coalesce allocas, their addresses
>> might not
>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>> different after all. If we
>>>>>>>>>>>>>>>>>>> already folded a comparison to "unequal" we are doomed
>> if we later
>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>>>>> a comparison that results
>>>>>>>>>>>>>>>>>>> in "equal". (Note, this is different from aliasing rules
>> as
>>>>>>>>>>>>>>>>>>> they can
>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>> stricter.)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Design:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I would hope we can come up with a model that treats
>> memory "the
>>>>>>>>>>>>>>> same",
>>>>>>>>>>>>>>>>>>> regardless if it is global,
>>>>>>>>>>>>>>>>>>> stack, or heap. I want to avoid special casing one of
>> them wrt.
>>>>>>>>>>>>>>> lifetime
>>>>>>>>>>>>>>>>>>> as I believe most optimizations
>>>>>>>>>>>>>>>>>>> would apply to any of them, potentially for different
>> reasons and
>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>> different gains, but nevertheless.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Proposal (largely based on the conversation in phab):
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> A1: Lifetime is a concept that talks about memory
>> content *only*.
>>>>>>>>>>>>>>>>>>> Basically, the memory content is set to
>>>>>>>>>>>>>>>>>>>           undefined by lifetime markers. It is
>>>>>>>>>>>>>>>>>>> derived/modifiable as
>>>>>>>>>>>>> it
>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>> just an "as-is" property of the memory
>>>>>>>>>>>>>>>>>>>           content. The lifetimes of an object, as
>> described by
>>>>>>>>>>>>> markers,
>>>>>>>>>>>>>>>>> might
>>>>>>>>>>>>>>>>>>> change during the compilation. They
>>>>>>>>>>>>>>>>>>>           might become smaller if we deduce the object is
>> not
>>>>>>>>>>>>>>>>>>> accessed
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>> the memory content is not used, they
>>>>>>>>>>>>>>>>>>>           might become larger if objects with
>> non-overlapping
>>>>>>>>>>>>> lifetimes
>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>> coalesced. (One could see the latter as
>>>>>>>>>>>>>>>>>>>           introducing a new object though.)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> A2: If we define lifetime as above, it has nothing to do
>> with the
>>>>>>>>>>>>>>>>>>> address of an object. Consequently, pointer
>>>>>>>>>>>>>>>>>>>           comparisons and similar operations are valid
>> outside the
>>>>>>>>>>>>>>> lifetime.
>>>>>>>>>>>>>>>>>>> Loads and stores are as well, they can
>>>>>>>>>>>>>>>>>>>           even not be removed "easily". A store followed
>> by a
>>>>>>>>>>>>>>>>>>> lifetime
>>>>>>>>>>>>>>>>> marker
>>>>>>>>>>>>>>>>>>> or a load following a lifetime marker
>>>>>>>>>>>>>>>>>>>           is dead or results in undef respectively.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> A3: We could not use lifetime to argue about addresses.
>> This means
>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>> could/should also not argue that overlapping
>>>>>>>>>>>>>>>>>>>           lifetimes result in different addresses. Thus,
>> a
>>>>>>>>>>>>>>>>>>> comparison
>>>>>>>>>>>>>>>>> between
>>>>>>>>>>>>>>>>>>> the address of two allocas could not
>>>>>>>>>>>>>>>>>>>           immediately be folded. That said, there would
>> be special
>>>>>>>>>>>>> cases
>>>>>>>>>>>>>>>>>>> though. Basically, if one of the allocas does
>>>>>>>>>>>>>>>>>>>           not escape, other than the comparisons to be
>> folded,
>>>>>>>>>>>>>>>>>>> we can
>>>>>>>>>>>>>>> fold
>>>>>>>>>>>>>>>>>>> them. Afterwards, coalescing or splitting
>>>>>>>>>>>>>>>>>>>           would still be consistent because it is
>> unobservable. The
>>>>>>>>>>>>>>> problem
>>>>>>>>>>>>>>>>>>> we have in-tree is that we fold even though
>>>>>>>>>>>>>>>>>>>           the address is still observed (after the fold).
>> It is
>>>>>>>>>>>>>>>>>>> still
>>>>>>>>>>>>>>>>> unclear
>>>>>>>>>>>>>>>>>>> to me what the impact of this would be
>>>>>>>>>>>>>>>>>>>           on real code. I suggested before that we run
>> some
>>>>>>>>>>>>> experiments
>>>>>>>>>>>>>>>>> first
>>>>>>>>>>>>>>>>>>> before we make any decision whatsoever.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> This is pretty much saying that lifetime markers are
>>>>>>>>>>>>> `memset(undef)`,
>>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>>>> you suggested before (I think).
>>>>>>>>>>>>>>>>>>> There are some implementation level differences but at
>> the end of
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> day they are basically the same.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Happy to hear some thoughts on this, especially if it
>> fixes the
>>>>>>>>>>>>>>> problems
>>>>>>>>>>>>>>>>>>> that lead to D93376 in the first place.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> ~ Johannes
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On 12/18/20 2:42 AM, Juneyoung Lee via llvm-dev wrote:
>>>>>>>>>>>>>>>>>>>> Hello all,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> We're discussing the well-formedness of
>> lifetime.start/end
>>>>>>>>>>>>> intrinsic
>>>>>>>>>>>>>>>>>>> here (
>>>>>>>>>>>>>>>>>>>> https://reviews.llvm.org/D93376), deciding what is a
>>>>>>>>>>>>> (syntactically
>>>>>>>>>>>>>>> &
>>>>>>>>>>>>>>>>>>>> semantically) valid use of these intrinsics.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I'd like to gather more context about the intrinsics.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 1. Is there a frontend or framework that introduces
>> lifetime call
>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>> non-stack allocated objects?
>>>>>>>>>>>>>>>>>>>> If exists, which behavior do they expect from it?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 2. Can a block's lifetime be started bytewise or
>> elementwise?
>>>>>>>>>>>>>>>>>>>> I imagine an optimization that allow using stack very
>> compactly,
>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>> wonder
>>>>>>>>>>>>>>>>>>>> there is a real-world use case.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>> Juneyoung
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>> LLVM Developers mailing list
>>>>>>>>>>>>>>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>>>>>>>>>>>>>>
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> --
>> Website: https://people.mpi-sws.org/~jung/
>>
>