[llvm-dev] [RFC] [Windows SEH] Local_Unwind (Jumping out of a _finally) and -EHa (Hardware Exception Handling)

Ten Tzen via llvm-dev llvm-dev at lists.llvm.org
Tue Mar 31 21:12:30 PDT 2020


Hi, all,

The intend of this thread is to complete the support for Windows SEH.
Currently there are two major missing features:  Jumping out of a _finally and Hardware exception handling.

The document below is my proposed design and implementation to fully support SEH on LLVM.
I have completely implemented this design on a branch in repo:  https://github.com/tentzen/llvm-project<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftentzen%2Fllvm-project&data=02%7C01%7Ctentzen%40microsoft.com%7Ced638e497aa74798b3f808d7d5e46775%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637213049272295023&sdata=Pd6gK%2B7JsIlfcyJLB%2FajWKdrbgqsITsseBfeB2Z5lgg%3D&reserved=0>.
It now passes MSVC's in-house SEH suite.

Sorry for this long write-up.  For better readability, please read it on https://github.com/tentzen/llvm-project/wiki<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftentzen%2Fllvm-project%2Fwiki&data=02%7C01%7Ctentzen%40microsoft.com%7Ced638e497aa74798b3f808d7d5e46775%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637213049272305020&sdata=SN9XBN6InU79U%2FEXnReyi9H1uPbVwTHgXhMkKODnA%2FM%3D&reserved=0>

Special thanks to Joseph Tremoulet for his earlier comments and suggestions.

Note: I just subscribed llvm-dev, probably not in the list yet.  So please reply with my email address (tentzen at microsoft.com<mailto:tentzen at microsoft.com>) explicitly in To-list.
Thanks,

--Ten
Windows SEH Support in LLVM
INTRODUCTION

An exception is an event that occurs during the execution of a program. It requires the execution of code outside the normal flow of control. There are two kinds of exceptions: hardware exceptions and software exceptions. Hardware exceptions are initiated by the CPU, such as division by zero or an attempt to access an invalid memory address. Software exceptions are initiated explicitly by applications or the operating system. Windows SEH (Structured exception handling) is a mechanism for handling both hardware and software exceptions. Windows C++ Exception Handling is almost fully supported in LLVM. Detailed design and new FuncletPad IR can be seen in https://llvm.org/docs/ExceptionHandling.html#exception-handling-using-the-windows-runtime<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fllvm.org%2Fdocs%2FExceptionHandling.html%23exception-handling-using-the-windows-runtime&data=02%7C01%7Ctentzen%40microsoft.com%7Ced638e497aa74798b3f808d7d5e46775%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637213049272305020&sdata=skHY8qjtHYdwUzJ9uln4vc2di20e5Sa8%2B4%2FmFS2tQ0M%3D&reserved=0>.
However, for SEH, LLVM today is missing two major features. This project intents to extend current model to achieve two missing features.

  1.  Local Unwind (AKA: Jumping out of _finally)
  2.  Hardware Exception Handling (AKA: MSVC++ option -EHa)

LOCAL UNWIND

In Windows SEH when a goto statement (or whatever statement, like break/continue/leave/return, that changes control flow) in a _finally targeting a label outside of the _finally clause, a "local-unwind" must be triggered to properly invoke _finally clauses alone the path from the goto statement to the target label. Since _finally clause can be executed in either "normal execution path" as well as "exception path", the _local_unwind can take place in both paths too.
Let's demonstrate all possible paths in the following example.

try {

  try {

    try {

      /* set counter = 1 */

      Counter += 1;

      if (ex)

        RtlRaiseException(&ExceptionRecord);

    }  finally {

      Counter += 1;

      if (abnormal_termination()) {

        printf(" inner finally: exception path \n\r");

      }

      else {

        printf(" inner finally: normal path \n\r");

      }

      if (lu) {

        printf(" inner finally: local unwind \n\r");

        goto t10;

      }

      printf(" inner finally: normal return \n\r");

    }

  } finally {

    Counter += 1;

    printf(" outer finally: \n\r");

  }

}

except(Counter) {

  /* set counter = 3 */

  printf(" except handler: \n\r");

  Counter += 1;

}

printf(" after outer try_except: \n\r");

t10:;

  *   Normal execution (ex is false), normal return (lu is false): both _finallys are executed normally, but _except_handler should not be executed. Output is:

inner finally: normal path:
inner finally: normal return:
outer finally:
after outer try_except:

  *   Normal execution (ex is false), local-unwind (lu is true): both _finallys are executed, due to local-unwind, control jumps to $t10, "after outer try_except" is not printed.

inner finally: normal path:
inner finally: local unwind:
outer finally:

  *   Exception execution (ex is true), normal return (lu is false): Windows runtime found the handler. It invokes inner _finally and outer _finally, then except-handler and jump to continue address, end of outer-try.

inner finally: exception path:
inner finally: normal return:
outer finally:
except handler:
after outer try_except:

*        Exception execution (ex is true), local-unwind (lu is true): Windows runtime found the handler. It invokes inner _finally where _local_unwind is kicked off. It unwinds to outer _finally then jump to target label, $t10. Again, "after outer try_except" is not printed.
inner finally: exception path:
inner finally: local unwind:
outer finally:
To perform local unwind, Windows provides a _local_unwind() runtime function that requires two input parameters: the target label address and the stack frame. Note that the 2nd parameter is 'Establisher's stack pointer, not a frame-pointer/base-pointer. With that all we need is to turn a goto statement into a _local_unwind() invoke. Since the target label is beyond function (_funclet) boundary, the target label must also be declared as a static global label (a MCSymbol in LLVM) that need be fixed up by Linker.
IR modeling for Optimizer:
While transferring a goto statement into a runtime function call/invoke is straight forward, another more complicate issue is how to model _local_unwind in IR so that Optimizer can see its control flows. In #2 case of above example, the control flowing from normal execution inner _finlly, passing through outer _finally, and landing in $t10 cannot be represented by LLVM IR today. Similarly in #4, the control starting from RtlRaiseException() passing through both _finally funclets then landing in $t10 was not seen.
To precisely represent _local_unwind flow, our proposed solution is:
*        Add one more catchpad/catchret pair that forwards control to local_unwind target. I.e., this extra Catchpad is the reentrance point for the _local_unwind() runtime.
*        This catchpad address is used to pass to _local_unwind() runtime, instead of the original goto target address.
*        The local_unwind catchpad will be handled the same way as _except-handler; it will not become a funclet, instead it's demoted to a normal label in parent function.
*        During LLVM BE code-gen and code layout pass, the Catchpad (local_unwind dispatching) block must be assigned the same EH state as the original goto target so that the local unwinding can be correctly landing at the right EH scope.
For example, the IR of above example today is briefly listed below.
________________________________
define dso_local i32 @main() #0 personality i8* bitcast (i32 (...)* @__C_specific_handler
..
%28 = invoke i32 bitcast (i32 (...)* @RtlRaiseException to
to label %29 unwind label %35,
; <label>:29: ; preds = %27
br label %30,
; <label>:30: ; preds = %29, %15
%31 = call i8* @llvm.localaddress(),
invoke void @"?fin at 0@main@@"(i8 0, i8* %31) #7
to label %32 unwind label %39,
; <label>:32: ; preds = %30
%33 = call i8* @llvm.localaddress(),
invoke void @"?fin at 0@main@@"(i8 0, i8* %33) #7
to label %34 unwind label %43,
; <label>:34: ; preds = %32
br label %53,
; <label>:35: ; preds = %27
%36 = cleanuppad within none [],
%37 = call i8* @llvm.localaddress(),
invoke void @"?fin at 0@main@@"(i8 1, i8* %37) #7 [ "funclet"(token %36) ]
to label %38 unwind label %39,
; <label>:38: ; preds = %35
cleanupret from %36 unwind label %39,
; <label>:39: ; preds = %38, %35, %30
%40 = cleanuppad within none [],
%41 = call i8* @llvm.localaddress(),
invoke void @"?fin at 0@main@@"(i8 1, i8* %41) #7 [ "funclet"(token %40) ]
to label %42 unwind label %43,
; <label>:42: ; preds = %39
cleanupret from %40 unwind label %43,
; <label>:43: ; preds = %42, %39, %32
%44 = catchswitch within none [label %45] unwind to caller,
; <label>:45: ; preds = %43
%46 = catchpad within %44 [i8* bitcast (i32 (i8*, i8*)* @"?filt at 0@main@@" to i8*)],
catchret from %46 to label %47,
; <label>:47: ; preds = %45
// except handler block
..
br label %53, !dbg !155
; <label>:53: ; preds = %47, %34
// after outer _try block
.. br label %56, !dbg !156
; <label>:t10:
.. ..
define internal void @"?fin at 0@main@@"(i8, i8* %1) #2 {
..
%2 = blockaddress($main, $t10)
call void @"?local_unwind@@"(i8* %1, i8* %2)
________________________________
The new IR is illustrated below. Changes are highlighted in bold:
________________________________
define dso_local i32 @main() #0 personality i8* bitcast (i32 (...)* @__C_specific_handler
..
%28 = invoke i32 bitcast (i32 (...)* @RtlRaiseException to
to label %29 unwind label %35,
; <label>:29: ; preds = %27
br label %30,
; <label>:30: ; preds = %29, %15
%31 = call i8* @llvm.localaddress(),
invoke void @"?fin at 0@main@@"(i8 0, i8* %31) #7
to label %32 unwind label %39,
; <label>:32: ; preds = %30
%33 = call i8* @llvm.localaddress(),
invoke void @"?fin at 0@main@@"(i8 0, i8* %33) #7
to label %34 unwind label %43,
; <label>:34: ; preds = %32
br label %53,
; <label>:35: ; preds = %27
%36 = cleanuppad within none [],
%37 = call i8* @llvm.localaddress(),
invoke void @"?fin at 0@main@@"(i8 1, i8* %37) #7 [ "funclet"(token %36) ]
to label %38 unwind label %39,
; <label>:38: ; preds = %35
cleanupret from %36 unwind label %39,
; <label>:39: ; preds = %38, %35, %30
%40 = cleanuppad within none [],
%41 = call i8* @llvm.localaddress(),
invoke void @"?fin at 0@main@@"(i8 1, i8* %41) #7 [ "funclet"(token %40) ]
to label %42 unwind label %43,
; <label>:42: ; preds = %39
cleanupret from %40 unwind label %43,
; <label>:43: ; preds = %42, %39, %32
%44 = catchswitch within none [label %45, label %60] unwind to caller,
; <label>:45: ; preds = %43
%46 = catchpad within %44 [i8* bitcast (i32 (i8*, i8*)* @"?filt at 0@main@@" to i8*)],
catchret from %46 to label %47,
; <label>:60: ; preds = %43
%61 = catchpad within %44 [i8* bitcast (i32 (i8*, i8*)* @"?IsLocalUnwind at 0@main@@" to i8*)]
catchret from %61 to label %t10
; <label>:47: ; preds = %45
// except handler block
..
br label %53,
; <label>:53: ; preds = %47, %34
// after outer _try block
..
br label %56,
; <label>:t10:
.. ..
define internal void @"?fin at 0@main@@"(i8, i8* %1) #2 {
..
%2 = blockaddress($main, %60)
call void @"?local_unwind@@"(i8* %1, i8* %2)
________________________________
Note that @"?IsLocalUnwind at 0@main@@" is a funclet, similar to @"?filt$0 at 0@main@@" of _except handler. The difference is that "?IsLocalUnwind at 0@main@@" is a dummy one which is never being called/checked by any runtime. It's there to make IR more readable and consistent with existing model. However, unlike ?filt$0 at 0@main@@ that will be referenced by EH table (for 1st pass, virtual unwind), "?IsLocalUnwind at 0@main@@" will be discarded by BE. At the end, there will not be a funclet generated in the output object file.
Dispatch on Try-Finally
When the outermost _try is a _finally, not an _excecpt construct, a pseudo _try/_except is added to dispatch _local_unwind. This try-except has one constant filter EXCEPTION_CONTINUE_SEARCH, so from functional perspective, it's virtually a NOP _try. Its only purpose is to model _local_unwind() exception path.
Multiple Local-Unwinds
If there exists two or more local_unwind targets, one catchpad/catchret pair is injected for each target. The catchpad/catchret must be added at the same _try scope as its corresponding target label. For example,
Try {
  try {
    try  { /* inner try */
      if (ex)
        RtlRaiseException(&ExceptionRecord);
    } finally  {
      if (lu)
        goto t10;
      else if (lu2)
        goto t20;
   else if (lu3)
        goto t30
      printf(" inner finally: normal return \n\r");
    }
  } except(Counter) {
    /* inner handler */
  }
  // after inner handler
  t10:
  ...
  t20:
  ..
except(1)  {
  /* outer handler */
}
// after outer try
t30:
// after t30
The corresponding IR is listed below. It must be the 2nd _try to dispatch the local unwind to t10 and t20.
________________________________
%12 = invoke i32 bitcast (i32 (...)* @RtlRaiseException to i32
to label %13 unwind label %16,
; <label>:13: ; preds = %0
%14 = call i8* @llvm.localaddress(),
invoke void @"?fin at 0@main@@"(i8 0, i8* %14) #7
to label %15 unwind label %20,
; <label>:15: ; preds = %13
br label %31,
; <label>:16: ; preds = %0
%17 = cleanuppad within none [],
%18 = call i8* @llvm.localaddress(),
invoke void @"?fin at 0@main@@"(i8 1, i8* %18) #7 [ "funclet"(token %17) ]
to label %19 unwind label %20,
; <label>:19: ; preds = %16
cleanupret from %17 unwind label %20,
; <label>:20: ; preds = %19, %16, %13
%21 = catchswitch within none [label %22, label %110, label %120] unwind label %34,
; <label>:22: ; preds = %20
%23 = catchpad within %21 [i8* bitcast (i32 (i8*, i8*)* @"?filt at 0@main@@" to i8*)],
catchret from %23 to label %24,
; <label>:110: ; preds = %20
%23 = catchpad within %21 [i8* bitcast (i32 (i8*, i8*)* @"?IslocalUnwindt10 at 0@main@@" to i8*)],
catchret from %23 to label %t10
; <label>:120: ; preds = %20
%23 = catchpad within %21 [i8* bitcast (i32 (i8*, i8*)* @"?IslocalUnwindt20 at 0@main@@" to i8*)],
catchret from %23 to label %t20
; <label>:24: ; preds = %22
// inner handler
%27 = invoke i32 (i8*, ...) @printf(i8* getelementptr inbounds ([31 x i8], [31 x i8]*
to label %28 unwind label %34,
; <label>:28: ; preds = %24
.. ..
br label %31,
; <label>:31: ; preds = %28, %15
// after inner handler
%33 = invoke i32 (i8*, ...) @printf(i8* ..
to label %49 unwind label %34,
; <label>:34: ; preds = %31, %24, %20
%35 = catchswitch within none [label %36, label %130] unwind to caller,
; <label>:36: ; preds = %34
%37 = catchpad within %35 [i8* null],
catchret from %37 to label %38,
; <label>:130: ; preds = %34
%37 = catchpad within %35 [i8* bitcast (i32 (i8*, i8*)* @"?IslocalUnwindt30 at 0@main@@" to],
catchret from %37 to label %t30
; <label>:38: ; preds = %36 // outer handler %41 = call i32 (i8*, ...) @printf(i8*..
br label %42,
; <label>:42: ; preds = %38, %53
// after outer try
%44 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([22 x i8], [22 x i8]*
br label %t30,
; <label>:t30: ; preds = %42
// after t30
ret
; <label>:49: ; preds = %31
br label %t10,
; <label>:t10: ; preds = %49
%51 = load i32, i32* %3, align 4,
%52 = add nsw i32 %51, 10,
store i32 %52, i32* %3, align 4,
br label %t20,
; <label>:t20: ; preds = %t10
%54 = load i32, i32* %3, align 4,
%55 = add nsw i32 %54, 20,
store i32 %55, i32* %3, align 4,
br label %42, !dbg !143
}
define internal void @"?fin at 0@main@@"(i8, i8* %1) #2 {
..
%2 = blockaddress($main, %110)
call void @"?local_unwind@@"(i8* %1, i8* %2)
..
%2 = blockaddress($main, %120)
call void @"?local_unwind@@"(i8* %1, i8* %2)
..
%2 = blockaddress($main, %130)
call void @"?local_unwind@@"(i8* %1, i8* %2)
..
________________________________
Implementation:
The first primary task of the design above is to determine the right place to add this pseudo 'catchwitch' construct in order to dispatch a local unwind to its target. One straight forward way is to add this new level immediately on top of the outermost _try that encloses the local-unwind statement and locates in the same EH scope as the unwind target.
Since semantic analysis and scope information are well constructed and performed in Clang's Parser/Semantic-analyzer, the implementation just need to slightly extend existent code to identifies local unwind statements and record LU targets in the outermost SEHTryStmt during Parser/Semantic phase.
For Break/Continue/Leave/Return local unwind, please see Sema::ActOnBreakStmt() and Sema::ActOnContinueStmt() and Parser::ParseSEHTryBlock(). For Goto local unwind, it's more complicated as it could be a forward reference. Our code utilizes JumpDiagnostics.cpp where Goto out of _finally is detected and reported. Please see the change in JumpScopeChecker::CheckJump().
The second task is in FE CodeGen. Before entering the Try, an extra EHCatchScope level is pushed into EHStack. Based on LU information recorded on SEHTryStmt by earlier Parser & Semantic phases, a handler (Catchpad) is created to dispatch local-unwind for each target associated with this Try statement. This handler block will be used as the target-address for MSVC's _local_unwind() runtime. See CodeGenFunction::pushSEHLocalUnwind() and CodeGenFunction::popSEHLocalUnwind().
Finally in LLVM calculateSEHStateNumbers() (see the change in WinEHPrepare.cpp), all _IsLocalUnwind**() filters in pseudo CatchSwitches are discarded and all LU dispatch handlers are assigned to its parent scope's EH state.
HARDWARE EXCEPTION HANDLING (-EHA)
The rules for C code:
For C-code, one way (MSVC approach) to achieve SEH -EHa semantic is to follow three rules. First, no exception can move in or out of _try region., i.e., no "potential faulty instruction can be moved across _try boundary. Second, the order of exceptions for instructions 'directly' under a _try must be preserved (not applied to those in callees). Finally, global states (local/global/heap variables) that can be read outside of _try region must be updated in memory (not just in register) before the subsequent exception occurs.
The impact to C++ code:
Although SEH is a feature for C code, -EHa does have a profound effect on C++ side. When a C++ function (in the same compilation unit with option -EHa ) is called by a SEH C function, a hardware exception occurs in C++ code can also be handled properly by an upstream SEH _try-handler or a C++ catch(...). As such, when that happens in the middle of an object's life scope, the dtor must be invoked the same way as C++ Synchronous Exception during unwinding process.
Design and Implementation:
A natural way to achieve the rules above in LLVM today is to allow an EH edge added on memory/computation instruction (previous iload/istore idea) so that exception path is modeled in Flow graph preciously. However, tracking every single memory instruction and potential faulty instruction can create many Invokes, complicate flow graph and possibly result in negative performance impact for downstream optimization and code generation. Making all optimizations be aware of the new semantic is also substantial.
This design does not intend to model exception path at instruction level. Instead, the proposed design tracks and reports EH state at BLOCK-level to reduce the complexity of flow graph and minimize the performance-impact on CPP code under -EHa option. Detailed implementation described below.
-- Two intrinsic are created to track CPP object scopes; eha_scope_begin() and eha_scope_end(). _scope_begin() is immediately added after ctor() is called and EHStack is pushed. So it must be an invoke, not a call. With that it's also guaranteed an EH-cleanup-pad is created regardless whether there exists a call in this scope. _scope_end is added before dtor(). These two intrinsics make the computation of Block-State possible in downstream code gen pass, even in the presence of ctor/dtor inlining.
-- Two intrinsic, seh_try_begin() and seh_try_end(), are added for C-code to mark _try boundary and to prevent from exceptions being moved across _try boundary.
-- All memory instructions inside a _try are considered as 'volatile' to assure 2nd and 3rd rules for C-code above. This is a little sub-optimized. But it's acceptable as the amount of code directly under _try is very small.
-- For both C++ & C-code, the state of each block is computed at the same place in BE (WinEHPreparing pass) where all other EH tables/maps are calculated. In addition to _scope_begin & _scope_end, the computation of block state also rely on the existing State tracking code (UnwindMap and InvokeStateMap).
-- For both C++ & C-code, the state of each block with potential trap instruction is marked and reported in DAG Instruction Selection pass, the same place where the state for -EHsc (synchronous exceptions) is done.
-- If the first instruction in a reported block scope can trap, a Nop is injected before this instruction. This nop is needed to accommodate LLVM Windows EH implementation, in which the address in IPToState table is offset by +1. (note the purpose of that is to ensure the return address of a call is in the same scope as the call address.
-- The handler for catch(...) for -EHa must handle HW exception. So it is 'adjective' flag is reset (it cannot be IsStdDotDot (0x40) that only catches C++ exceptions).

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200401/73dfb7f9/attachment-0001.html>


More information about the llvm-dev mailing list