[PATCH] [WinEH] Update C++ exception state numbering code

Thu May 7 12:52:58 PDT 2015

REPOSITORY
  rL LLVM

================
Comment at: lib/CodeGen/AsmPrinter/Win64Exception.cpp:338
@@ -327,1 +337,3 @@
+      // FIXME: Should this be using FuncInfo.HandlerBaseState?
+      if (SawPotentiallyThrowing && LastEHState != -1) {
         FuncInfo.IPToStateList.push_back(std::make_pair(LastLabel, -1));
----------------
rnk wrote:
> andrew.w.kaylor wrote:
> > It isn't clear to me what this is doing.  I strongly suspect that the hard-coded -1 here is incorrect.
> > 
> > I added code which always adds an ip2state entry for the beginning of a function.  It seems that this entry is meant to handle the case where this state is needed in the parent function.  If this case is never hit, having the -1 state for the parent function may not be strictly necessary.  I added it to mimic the behavior of MSVC.
> > 
> > I suspect that this code will misbehave if we are in a catch handler.
> The -1 state always means "take no EH actions if this PC throws", so I think -1 is right. This code is basically searching for this pattern:
>   invoke void @f() ; landingpad has EHState 0
>   ...
>   call void @g() ; has *no* actions, so EHState -1
>   ...
>   invoke void @h() ; landingpad has EHState 1
> 
> If we get to @h, and we saw a potentially throwing call instruction (@g) between the last invoke and the current invoke, then we should put a -1 entry in the table to prevent running actions from either @f or @h when @g throws.
> 
> It doesn't matter if we're in a catch handler or the parent function. In practice, all calls in a deeply nested catch handler will have actions on the stack, so we'll never see a plain call that potentially throws.
I see what you're saying.  I just put together a test case that hits this code in a catch handler, and I agree that using -1 here would produce the correct results.

My only objection, and this is more principle than practical, is that the MS runtime seems to assume that catch handlers will have a range of states from TBME.TryHigh+1 to TBME.CatchHigh.  I'm not sure there's anything that absolutely depends on this, but that seems to be the general expectation.  My concern is that even if we can get away with not doing things that way now we may fail with the next version of the runtime.

================
Comment at: lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp:433
@@ -432,3 +453,2 @@
     createUnwindMapEntry(currentEHNumber(), nullptr);
-    ++NextState;
   }
----------------
rnk wrote:
> andrew.w.kaylor wrote:
> > I removed this line of code because we weren't actually using a new state value here.  The state number returned by currentEHNumber was previously claimed when the action at the top of the handler stack was pushed and the NextState value is incremented there.  Incrementing it here results in an EH state value that doesn't appear anywhere in our tables.
> > 
> > Honestly, I'm not entirely clear as to why an unwind entry is being created here.
> > 
> > Near as I can tell we can only get here when a cleanup handler has been popped from the handler stack, as might happen at the first call site after a non-try scope ended in the middle of a try block.  But wouldn't we have already created an unwind entry for that cleanup handler when it was pushed on the handler stack?
> First, NextState should really be synced with UnwindMapEntries. States are really indices into the unwind map table, which gives you a cleanup action and a state transition that you follow. We should probably have createUnwindMapEntry return the next state instead of maintaining the number separately.
> 
> I think the intention of this code was to recover from merging multiple catch action pushes into one, which optimistically assumes that two catch actions pushed at once must be part of a try / catch / catch sequence. In the code sample below, we see the first invoke, and only assign one new state number. We don't know until later that one catch is popped by itself, and that we'll need a new state number for the outer catch:
>   try {
>     try {
>       may_throw(); // state 0
>     } catch (int) { }
>     may_throw(); // state 1
>   } catch (float) { }
> 
> Until we start treating try / catch / catch specially, this code can probably be deleted.
I'm not sure I understand your example.

If we introduced code similar to what I had originally proposed where both catches in the following case would be assigned the same state
```
try {
  may_throw();
} catch (int) {
} catch (float) { }
```
that code would also assign the same state to both catches in this case if nothing in the first catch handler used a landing pad:
```
try {
  try {
    may_throw();
  } catch (int) { }
} catch (float) { }
```
But that would be OK because nothing needs the other state.

However, in your example:
```
try {
  try {
    may_throw(); // state 0
  } catch (int) { }
  may_throw(); // state 1
} catch (float) { }
```
The code that pushes those handlers on the handler stack would give them different states.

Am I missing something?

In any event, I did delete this code in the new revision I uploaded last night.

================
Comment at: lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp:481
@@ -450,1 +480,3 @@
+      FuncInfo.HandlerBaseState[F] = NextState;
+      ++NextState;
       calculateStateNumbers(*F);
----------------
rnk wrote:
> andrew.w.kaylor wrote:
> > Here I'm claiming a new EH state but not creating an unwind entry.  That may be the cause of the problem I saw with the MaxState value in the xdata table being larger than the size of the unwind map.
> > 
> > Do I need to create an unwind entry here?
> Yes. States are indices into the UnwindMap.
> 
> I don't think your base state assignment approach actually works in the face of basic block reordering. It's possible to push and pop a handler twice after CFG reordering. Suppose we have code like this:
> ```
> void foo() {
>   try {
>     f();
>     if (unlikely_cond)
>       g();
>   } catch (int) {
>     int_handler();
>   }
>   h();
> }
> ```
> For whatever reason, IR passes have reordered the CFG like this:
> ```
>   invoke @f() ; catch int
>   br i1 unlikely_cond %later, %cont
> cont:
>   call @h()
>   ret void
> later:
>   invoke @g() ; catch int
>   br label %cont
> ```
> WinEHPrepare will outline exactly one catch handler, but WinEHNumbering will push and pop that one catch handler twice, and we'll assign the base state twice, when we only need to do it once.
> 
> Also, I don't understand why catch handlers need their own new state, but maybe that's just because I don't understand what the runtime is really doing. It would seem like a catch handler is actually in the EH state of the code before the try, which is just -1 for code at the top level of a function. Maybe the runtime just needs a new state here. Anyway, inventing a new state is certainly a valid way to ensure that CatchHigh is greater than TryHigh. :)
Yes, my update last night add a call to create an unwind entry here.

As for the catch handler getting a base state, partly this change is part of my efforts to get our numbering to behave more like I perceive the MS runtime to be expecting.  I believe this also fixes some issues.  In particular, if a catch handler contains no landing pads our current implementation will record an incorrect value for TBME.CatchHigh (which is based on NextState at the end of the handler processing).  I've seen actual test cases where this caused the runtime to claim not to be able to find a handler for a state that had a handler.

That said, there are serious problems with our current numbering.  As I mentioned before, this patch was just meant to get some things I had in progress committed while I continued to work on other problems.

The block re-ordering problem you describe is interesting.  I have been working on a couple of other cases that break our current model even without unusual block ordering.  This first one looks more or less like this:
```
void test() {
  try {
    f();
    try {
      g();
    } catch (...) {}
    h();
  } catch (int) {}
}
```
In this case, the "catch (int)" handler disappears from the action list during the call to g() but then reappears at h().

The second case looks something like this:
```
void test() {
  try {
    try {
      try {
        a();
      } catch (int x) {}
      b();
    }
    catch (char c) {}
    e();
  } catch(int x) {
  } catch(...) {}
}
```
There seems to be some variation in the code that clang generates for this and I haven't been able to isolate the conditions that determine it, but I have a test case like this where at a() the action list is:
```
catch (int)  // The first "catch (int)" handler
catch (char)
catch (...)
```
Then at b() the action list is 
```
catch (char)
catch (int) // The second "catch (int)" handler
catch (...)
```
This leads our current code to pop the "catch (char)" handler from the stack (and perform all the associated actions) and immediately re-push it because its position in the list changes.

I have an idea for how I can fix these problems, but it isn't terribly elegant.  If it works, I may want to put it in place anyway until we can design something better.

http://reviews.llvm.org/D9512

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/