[PATCH] D93376: [LangRef] Clarify the semantics of lifetime intrinsics

Sat Dec 26 04:19:32 PST 2020

RalfJung added a comment.

> I'd say that as a matter of policy the LangRef must be a document that people can rely on, else what's the point of having it? Since it is written as-if lifetime intrinsics could be applied rather freely

I think it is rather clear that in this case, LangRef is ambiguous and incomplete (and not entirely coherent with "what happens in the code"), so in my opinion clinging to what exactly it says is less useful than trying to figure out the intent of these intrinsics by taking into account both LangRef and "what happens in the code".

> It's fair to say that this patch is a change of semantics by forbidding the use of the lifetime intrinsics in certain places where they were previously allowed.

They were allowed by the LangRef but not supported in the code, and the discussion around their introduction not even considered this case. Looks like a LangRef bug to me. The original authors presumably never considered the possibility of using the intrinsics for anything else, so they did not deem this case even worth mentioning.

Neither LangRef nor the code let us say anything precise and meaningful about what happens when the intrinsics are applied to anything other than alloca. As this discussion shows, even saying what they do for alloca is very non-trivial. Figuring out what exactly the intrinsics do in general is a significant addition to the LangRef -- do you agree with this part? So maybe we can add some wording that "if the intrinsics are used in this specific syntactically constrained way, then their behavior is X; otherwise their behavior is currently unclear and needs to be more precisely determined". Then we can later separately discuss removing the "otherwise" clause and saying that other uses are simply forbidden (or someone finds a good way to precisely document the more general case).

> I agree that this is a bit iffy, though I'm not sure a literally operational specification is always required.

I don't know any other way to make a precise specification that is internally consistent.

I am considering the patch to still be "operational enough" because the non-operational part is given by fairly simple purely syntactic rules that only check the body of the current function. So taking into account "is there, syntactically, an intrinsic call on the result of this alloca" is okay -- it's not great, but at least it is fairly clear how to make it precise. I am only worried that there might be code moving/removing optimizations that change this syntactic information in problematic ways, but we first need a more precise LangRef to even say which changes are 'problematic'. If that's what you mean by not using a "literally operational specification", we are in agreement.

What I object to is specifications that "look into the future" by asking questions like "will the result of this alloca be passed to lifetime.start some time during this program execution". First of all, the data-flow implicit in this statement is rather subtle, and secondly, as my example shows, "looking into the future" to alter past behaviors can easily lead to contradictions reminiscent of "causal loops".

> With this patch, the program isn't just UB, it actually becomes ill-formed IR because the lifetime.start in f(p) doesn't syntactically reference an alloca. So the example does not exist.

Indeed, and IMO that's good. We have no idea how to properly define the semantics of the program, and there is no good reason to support this program in the first place, so punting the question by excluding it from the domain of programs worth considering seems like a good strategy to me.

> The one change I would make in that proposal is to say that "if there is a lifetime marker use, an alloca is constrained". As long as the markers must be in the same IR function as the alloca itself, and phi/select are syntactically excluded, this determination can be made reliably (if not easily, due to the bitcast mess).

So the "is" here is a purely syntactic check? That sounds pretty similar to what the patch says here, right (modulo the details of the meaning of "constrained", and of course saying what exactly constitutes a "use")?

The purely syntactic check fails, however, for allocas that are not following the syntactic restriction proposed in the patch (or a slight relaxation of this restriction to support a few more operations on the "use path" from the alloca to the intrinsic).

> Ideally, there would be a flag on the alloca instruction itself to distinguish between constrained and unconstrained. As it is, the flag is implicitly given as "has a (indirect) lifetime.start use". That's not great, and perhaps you could make a follow-up patch that introduces such a flag explicitly, but I think it's workable either way.

Note that it's not just a boolean flag. For constrained allocas, it also matters with which other constrained allocas their lifetime can potentially overlap. This determines which allocations might overlap in memory, and which may not. So even an explicit boolean flag would be insufficient to avoid "magic syntactical rules".

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93376/new/

https://reviews.llvm.org/D93376