[LLVMdev] Questions about the semantics for lifetime intrinsics...

Mon Jul 29 01:45:20 PDT 2013

Chandler Carruth wrote:
> So, in hacking on mem2reg I noticed that it doesn't actually implement
> optimizations enabled by lifetime markers. I thought I might take a stab
> at teaching it about them, but I'm left with some questions about the
> semantics. Much of this may have been hash out when they were added, and
> if so I'll appreciate your help educating me, and maybe we can come up
> with improved documentation to cover this.
>
> First, is there any realistic intent that these be used for heap
> pointers? If so, what is the expected use case?

As you noticed by the lack of implementation in mem2reg, they're 
currently only implemented for heap pointers.

The use case is for lowering a stack-based language to LLVM IR. The 
language's allocate-space-from-the-"stack" (really in heap) function 
would use lifetime.start to indicate that the stack slot contains 
uninitialized memory, and the pop function would use lifetime.end to 
indicate that the memory is dead for DSE purposes.

> Second, if the answer to the first is 'no', then could we remove the
> 'start' intrinsic? It seems redundant as the value of an alloca prior to
> a store to that alloca is already 'undef'.

Lifetime.start and lifetime.end are *almost* the same. There's a long 
thread on the subject back here: 
http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-December/057121.html 
which is particularly interesting for the conversation about replacing

To quote my past self, "You can almost entirely model lifetime.start and 
lifetime.end as being a store of undef to the address. However, they're 
the tiniest bit stronger. With a store of undef, you can delete stores 
that precede (with no intervening load) and loads that follow (with no 
intervening store). On top of that, a start lets you delete loads that 
precede, and an end lets you delete stores that follow."

> Third, what is the semantic model intended for the size argument? The
> documentation says:
>
> """
> The first argument is a constant integer representing the size of the
> object, or -1 if it is variable sized.
> """
>
> Part of this seems a bit confusingly specified -- what does it mean by
> "the object"? I assume what it really means is the object read from
> memory by a corresponding load, but that in and of itself is confusing
> because the pointer passed to this routine is typically not the pointer
> loaded. There might be many different loads of different type objects
> all corresponding to the same lifetime intrinsic.

I agree, speaking of the "object" is out of place here. It's just the 
length in bytes of lifetime starts or ends. Put another way, it's 
equivalent to a series of consecutive one-byte starts/ends.

> The best way I have of interpreting it is in terms of the 'end'
> intrinsic: the results of it is equivalent to that of a notional store
> of '[i8 x N] undef' to the provided pointer where N is the size.
> However, this becomes truly muddy in the presence of '-1' size which
> just means a "variable" size. But how much is variable? Where does the
> undef stop?
>
> I think the whole thing would be somewhat clearer as an intrinsic with
> an arbitrary pointer type and a boolean flag for 'is_variable_length'.
> If the flag is false (common), the pointee type's store size is the size
> of the region whose lifetime is marked. If the flag is true, the pointer
> must be an alloca instruction itself with a runtime size, and the entire
> alloca's lifetime is marked. Among other benefits, this would make
> mem2reg and other analyzes easier and faster by decreasing the bitcast
> or gep instructions that result from common frontend lowering patterns.
> It also matches more closely the behavior of load and store.

I think the -1 case is intended to mean "the whole thing" for users who 
don't want to lower to a specific number of bytes.

Nick