[LLVMdev] RFC: New EH representation for MSVC compatibility

Thu Jun 4 11:37:15 PDT 2015

> On May 15, 2015, at 3:37 PM, Reid Kleckner <rnk at google.com> wrote:
> After a long tale of sorrow and woe, my colleagues and I stand here before you defeated. The Itanium EH representation is not amenable to implementing MSVC-compatible exceptions. We need a new representation that preserves information about how try-catch blocks are nested.

A couple quick apologies: this response is pretty late, and for the same reasons I’ve only been able to skim the rest of the thread.

I think the basic ideas in this proposal seem reasonable, although it seems they may have evolved a bit over the course of the thread.  A few points do stand out to me:

> Instead, all MSVC EH personality functions (x86, x64, ARM) cross (C++, SEH) are implemented with interval tables that express the nesting levels of various source constructs like destructors, try ranges, catch ranges, etc. When you rinse your program through LLVM IR today, this structure is what gets lost.

Yes.  It seems to me that the main additional thing that your proposed IR preserves is the ability to very easily reconstruct the tree of control flow.  Correct?

> New information
> -------------------------
> 
> Recently, we have discovered that the tables for __CxxFrameHandler3 have the additional constraint that the EH states assigned to a catch body must immediately follow the state numbers assigned to the try body. The natural scoping rules of C++ make it so that doing this numbering at the source level is trivial, but once we go to LLVM IR CFG soup, scopes are gone. If you want to know exactly what corner cases break down, search the bug database and mailing lists. The explanations are too long for this RFC.

I don’t quite understand this constraint (aren’t the EH states within the catch body in a different function and therefore numbered separately?), and I don’t really understand why it makes anything about EH harder (aren’t you still going to have exactly the same numbering problems with cleanups/catches being shared between invoke sites?) as opposed to simply being something that you didn’t design your current implementation around, but it doesn’t really matter.  If the new representation is better, it’s better.

> New representation
> ------------------------------
> 
> I propose adding the following new instructions, all of which (except for resume) are glued to the top of their basic blocks, just like landingpads.

The fact that most of these are terminators makes pinning to the absolutely beginning really problematic.  An edge that can’t support arbitrary code is one thing (although usually they’re at least splittable!), but at the very least, we need to be able to drop phis and debug instructions in basic blocks, or you’re going to completely wreck basic optimizability.

You should consider giving these a consistent prefix, like “eh_” or “eh.”, just to clearly distinguish them.  You can rename “resume” and “landingpad”, too.

> They all have an optional ‘unwind’ label operand, which provides the IR with a tree-like structure of what EH action to take after this EH action completes. The unwind label only participates in the formation of the CFG when used in a catch block, and in other blocks it is considered opaque, personality-specific information. If the unwind label is missing, then control leaves the function after the EH action is completed. If a function is inlined, EH blocks with missing unwind labels are wired up to the unwind label used by the inlined call site.

For the terminators, this unwind label makes sense.  For the non-terminators (just cleanupblock, I think), you’re going to need to define what it means, its strength of reference, etc.  For example, if I have a cleanup that doesn’t terminate (imagine a destructor that calls abort()), the IR will contain no CFG links from the basic block with the cleanupblock to the basic block with the resume.  The basic block with the resume, and all the downstream EH blocks, may even be unreachable; and so the natural tendency would be to remove them.  Does a reference from a cleanupblock keep its target alive?

> The new representation is designed to be able to represent Itanium EH in case we want to converge on a single EH representation in LLVM and Clang. An IR pass can convert these actions to landingpads, typeid selector comparisons, and branches, which means we can phase this representation in on Windows at first and experiment with it slowly on other platforms. Over time, we can move the landingpad conversion lower and lower in the stack until it’s moved into DwarfEHPrepare. We’ll need to support landingpads at least until LLVM 4.0, but we may want to keep them because they are the natural representation for Itanium-style EH, and have a relatively low support burden.

I agree that we could migrate Itanium to this pattern fairly successfully, as long as we’re agreed that we’re not setting of goal of eventually emitting identical IR for different personalities.

> resume
> -------------
> 
> ; Old form still works, still means control is leaving the function.
> resume <valty> %val
> ; New form overloaded for intra-frame unwinding or resuming normal execution
> resume <valty> %val, label %nextaction
> ; New form for EH personalities that produce no value
> resume void
> 
> Now resume takes an optional label operand which is the next EH action to run. The label must point to a block starting with an EH action. The various EH action blocks impose personality-specific rules about what the targets of the resume can be.

I agree with the feedback elsewhere that you should separate these instructions.

> catchendblock

> ----------------
> 
> catchend unwind label %nextaction
> 
> The catchend is a terminator that unconditionally unwinds to the next action. It is merely a placeholder to help reconstruct which invokes were part of the catch blocks of a try. Invokes that are reached after a catchblock without following any unwind edges must transitively unwind to the first catchend block that the catchblock unwinds to. Executing such an invoke that does not transitively unwind to the correct catchend block has undefined behavior.

I think the rule you’re looking for here is that it’s undefined behavior if control flow from a catchblock doesn’t eventually reach the corresponding catchendblock (or reaches the catchblock again before that point).  It’s not unwind-specific.

> cleanupblock
> --------------------
> 
> %val = cleanupblock <valty> unwind label %nextaction
> 
> This is not a terminator, and control is expected to flow into a resume instruction which indicates which EH block runs next. If the resume instruction and the unwind label disagree, behavior is undefined.

What’s the expectation here?  Is each cleanupblock conceptually an independent cleanup, or is it a legal transformation to combine successive cleanupblocks?  Is it okay for the code within a cleanupblock to be reachable from multiple cleanupblock instructions?  Is there an expectation about how many resumes are reachable?  Does the cleanupblock instruction itself prevent reordering in any way?

> terminateblock
> ----------------------
> 
> ; for noexcept
> terminateblock [void ()* @std.terminate] unwind label %nextaction
> ; for exception specifications, throw(int)
> terminateblock [void ()* @__cxa_unexpected, @typeid.int <http://typeid.int/>, ...] unwind label %nextaction
> 
> This is a terminator, and the unwind label is where execution will continue if the program continues execution. It also has an opaque, personality-specific list of constant operands interpreted by the backend of LLVM. The convention is that the first operand is the function to call to end the program, and the rest determine if the program should end.

It looks like you’re expecting this be useful for things like EH filters; that's cool, but it suggests the name might not be appropriate.  Maybe the instruction should be called “eh.filter” and it should take a string label to tell the personality the kind of filtering to do?  And then the function pointer is just another thing in the list of constant parameters.

Also, std::unexpected doesn’t necessarily terminate the program; it gets a chance to remap the exception.  I don’t know that that changes anything — it just means that control continues to the unwind label — but you should keep that in mind in your examples.

> sehfilterblock?
> ------------------
> 
> One big hole in the new representation is SEH filter expressions. They present a major complication because they do not follow a stack discipline. Any EH action is reachable after an SEH filter runs. Because the CFG is so useless for optimization purposes, it’s better to outline the filter in the frontend and assume the filter can run during any potentially throwing function call.

It’s not that it doesn’t follow a stack discipline, it’s that it’s not quite part of the same stack as other EH actions.  It’s similar in spirit to a catch block in Smalltalk, which runs before the stack is unwound because it has the ability to resume control at the throw point.  One way to model this would be to have a list of such blocks handing off the invoke (or a landingpad-like instruction that’s easily found from the invoke), where inlining would pull them from outer invokes to inner calls.  But the control flow is inherently strange because it can essentially loop within the invoke, and that’s really hard to model.

John.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150604/31b080b4/attachment.html>