[PATCH] D12353: [WinEH] Update coloring to handle nested cases cleanly

Thu Aug 27 17:53:41 PDT 2015

JosephTremoulet added inline comments.

================
Comment at: lib/CodeGen/WinEHPrepare.cpp:3101-3103
@@ +3100,5 @@
+      // Queue exits with the parent color.
+      for (User *Exit : VisitingHead->users())
+        for (BasicBlock *Succ :
+             successors(cast<Instruction>(Exit)->getParent()))
+          if (BlockColors[Succ].insert(Color).second) {
----------------
majnemer wrote:
> I think we usually have braces if an inner block uses them.
Ah, good to know.  Will fix, thanks.

================
Comment at: test/CodeGen/WinEH/wineh-cloning.ll:269-270
@@ +268,4 @@
+; %inner is a cleanup which appears both as a child of
+; %left and as a child of %right.  Since statically we
+; need each funclet to have a single parent, we need to
+; clone the entire %inner funclet so we can have one
----------------
majnemer wrote:
> JosephTremoulet wrote:
> > majnemer wrote:
> > > JosephTremoulet wrote:
> > > > JosephTremoulet wrote:
> > > > > JosephTremoulet wrote:
> > > > > > majnemer wrote:
> > > > > > > When you say each funclet needs to have a single parent, do you mean that it must have a single non-invoke predecessor?  It will be quite common for a set of invokes in a try-block to have the same unwind destination.
> > > > > > I mean that all of its invoke predecessors, after cloning, must be in the same funclet.
> > > > > This might be a more illustrative example:
> > > > > ```
> > > > > define void @foo() personality etc {
> > > > > entry:
> > > > >   invoke void @f()
> > > > >     to label %exit unwind label %funcletA
> > > > > funcletA:
> > > > >   %A = catchpad []
> > > > >          to label %bodyA unwind label %endpad
> > > > > bodyA:
> > > > >   invoke void @g()
> > > > >     to label %invoke.cont unwind label %endpad
> > > > > invoke.cont:
> > > > >   invoke void @h()
> > > > >     to label %retA unwind label %funcletB
> > > > > retA:
> > > > >   catchret %A to label %exit
> > > > > funcletB:
> > > > >   %B = cleanuppad []
> > > > >   call void @i()
> > > > >   cleanupret %B unwind to caller
> > > > > exit:
> > > > >   ret void
> > > > > }
> > > > > ```
> > > > > 
> > > > > Say we enter `@foo()` and generate a stack frame for it, then the call to `@f()` raises an exception that is handled by `funcletA`, so the runtime calls `funcletA` (and we generate a stack frame for it).  Now there are two `invoke`s in `funcletA`, both of which are handled by `funcletB`, but if the call to `@g()` faults the `endpad` indicates that the runtime needs to unwind out of `funcletA` before calling `funcletB`, whereas if the call to `@h()` faults then the runtime is supposed to invoke `funcletB` while `funcletA`'s frame is still on the stack.  I don't think we can expect WinEH targets to support encoding and executing that arrangement without making two copies of `funcletB`.
> > > > > (in the case where the call to `@h()` faults, the `cleanupret` that unwinds to caller instead of unwinding to `%retA` is UB, so we'd want to replace it with `unreachable` in that copy of `funcletB`, but I think we still want a copy just in case `@h()` does not return dynamically and so the program never executes UB; similarly, if the input already had `unreachable` there instead of a `cleanupret`, we wouldn't know statically which reporting is correct for `funcletB` and so I'd think would want two copies of it)
> > > > sigh... insert
> > > > ```
> > > > endpad:
> > > >   catchendpad unwind label %funcletB
> > > > ```
> > > > somewhere in `@foo` in the previous example.
> > > Our langref describes `catchendpad` using the following language:
> > > >  The unwind target of invokes between a catchpad and a corresponding catchret must be its catchendpad or an inner EH pad.
> > > 
> > > 
> > > It was my understanding that all invokes in a `catchpad` funclet must transitively unwind to the `catchendpad`.
> > > 
> > > Your example would violate this because the `catchret` in `funcletB` uses `unwinds to caller`.
> > > The unwind target of invokes between a catchpad and a corresponding catchret must be its catchendpad or an inner EH pad.
> > 
> > I think that "inner" in that sentence is ill-defined.  I also think this would be a difficult invariant for transformations to determine whether they're violating it or not.  I think the parts of the langref that describe UB for executing a mismatched ret/catchret/cleanupret/catchendpad get at the same issue and are better defined (or will be once we have cleanupendpad; currently they refer to an ill-defined notion of "unwinding out of a cleanuppad") and more manageable for transformations.
> > 
> > So, for example, I don't know how a (hypothetical) transformation like tail-merge (extended to treat `unreachable` like some sort of wildcard join with any program point) is supposed to know that it is illegal to transform this:
> > ```
> > define void @foo() personality etc {
> > entry:
> >   invoke void @f()
> >     to label %exit unwind label %funcletA
> > funcletA:
> >   %A = catchpad []
> >          to label %bodyA unwind label %endpad
> > bodyA:
> >   invoke void @g()
> >     to label %invoke.cont unwind label %endpad
> > invoke.cont:
> >   invoke void @h()
> >     to label %retA unwind label %funcletB1
> > retA:
> >   catchret %A to label %exit
> > endpad:
> >   catchendpad unwind label %funcletB2
> > funcletB1:
> >   %B1 = cleanuppad []
> >   call void @i()
> >   unreachable
> > funcletB2:
> >   %B2 = cleanuppad []
> >   call void @i()
> >   cleanupret %B2 unwind to caller
> > exit:
> >   ret void
> > }
> > ```
> > into the previous example.
> > 
> > 
> > Or if that's too far-fetched, what do you think of this example:
> > ```
> > define void @foo() personality etc {
> > entry:
> >   invoke void @f()
> >     to label %exit unwind label %funcletA
> > funcletA:
> >   %A = catchpad []
> >          to label %bodyA unwind label %endpad
> > bodyA:
> >   invoke void @g()
> >     to label %invoke.cont unwind label %endpad
> > invoke.cont:
> >   invoke void @h()
> >     to label %retA unwind label %funcletB
> > retA:
> >   catchret %A to label %exit
> > endpad:
> >   catchendpad unwind label %funcletB
> > funcletB:
> >   %B = cleanuppad []
> >   call void @i()
> >   unreachable
> > exit:
> >   ret void
> > }
> > ```
> > ?
> I agree that in the face of `unreachable`, things get problematic for the preparation machinery.
> 
> Do you think it would be heroic for WinEHPrepare to eventually handle this?
> Do you think it would be heroic for WinEHPrepare to eventually handle this?

No, I don't think it'll be too bad.  One by-product of the coloring algorithm is a directed graph with one node per funclet (plus a root representing the main function), with an edge X -> Y wherever an invoke/catchendpad/cleanupret in X has an unwind edge to the start of Y (i.e. where you'd expect Y to be a an inner funclet nested in X).  I think a straightforward walk could construct a tree by "exploding out" joins and exploring all acyclic paths from the root, resulting in a tree where each node has the color of some funclet.  Then any funclet whose color appears more than once in the tree could be duplicated, and any affected unwind/catchret edges could be fixed up, so that after cloning we have one funclet for every node in the tree and they could nest like you'd want because it's a tree.  I've been debating whether it makes more sense to do exactly that as another pass after the current cloning (which has the appeal that you could skip it in the ~100% of the cases where the original graph is a tree), or whether it would be natural to fold all of the cloning into the graph walk.

http://reviews.llvm.org/D12353