[LLVMdev] RFC: New Exception Handling Proposal
Bill Wendling
wendling at apple.com
Wed Nov 18 14:22:40 PST 2009
I've been looking into a new way to implement exception handling in
LLVM. The current model has many disadvantages, in my opinion. I try
to address them with this proposal. I also try to make exception
handling much more understandable to the normal human reader. :-) Any
new proposal will need to address all present and future languages'
exception handling methodologies. I believe that I created something
which is generic enough.
Please read and let me know your opinions!
N.B. I'm not wedded to the name I chose here. Nor the implementation
of some if the intrinsics - some may be better placed as an attribute
of the function.
Also, this does *not* address the general issue of how we handle all
exceptional situations, i.e. floating point exceptions and the like.
NEW EXCEPTION HANDLING
======================
Current Exception Handling
--------------------------
Zero-cost exception handling is done by generating metadata for the
unwinding
library. That library, along with a "personality" function, determines
where to
land after an exception is thrown. Given enough information, the
metadata can
and should be generated only by the code generation section of the
compiler.
The current exception handling mechanism encodes the exception
handling metadata
by using intrinsics and the CFG itself. Some of the code it generates
isn't
executable code, but meant purely to specify information for
generating the
exception handling table. For instance, if you have this simple code
snippet:
#include <cstdio>
void bar();
void foo() {
try {
bar();
} catch (int i) {
printf("i == %d\n", i);
} catch (const char *s) {
printf("s == %s\n", s);
} catch (...) {
printf("catch-all\n");
}
}
The llvm IR for foo looks similar to this (simplified for readability):
define void @_Z3foov() {
entry:
invoke void @_Z3barv()
to label %return unwind label %lpad
return:
ret void
lpad:
%eh_ptr = tail call i8* @llvm.eh.exception()
%eh_select27 = tail call i32 (i8*, i8*, ...)*
@llvm.eh.selector(i8* %eh_ptr,
i8* @__gxx_personality_v0,
i8* @_ZTIi,
i8* @_ZTIPKc,
i8* null)
%eh_typeid = tail call i32 @llvm.eh.typeid.for( @_ZTIi )
%6 = icmp eq i32 %eh_select27, %eh_typeid
br i1 %6, label %bb1, label %ppad
ppad:
%eh_typeid55 = tail call i32 @llvm.eh.typeid.for( @_ZTIPKc )
%7 = icmp eq i32 %eh_select27, %eh_typeid55
%8 = tail call i8* @__cxa_begin_catch(i8* %eh_ptr) nounwind
br i1 %7, label %bb2, label %bb3
bb1:
;; printf("i == %d\n", i)
ret void
bb2:
;; printf("s == %s\n", s)
ret void
bb3:
;; printf("catch-all\n")
ret void
}
Note that the `llvm.eh.selector' call indicates:
1. that the basic block it's in is a landing pad,
2. the personality function,
3. the types that can be caught,
4. the types that can be thrown by the calling function (though not
with this
example), and
5. the presence of a cleanup or catch-all block.
The "post pad" (ppad) checks the type of the thrown exception to
determine if
it's caught.
Notice that all of this information is separated from the place where
it's most
useful - the invoke instruction. None of the transformation passes
know about or
can reason about these intrinsics (i.e., they can't be optimized). In
particular, there's no concept of a "landing pad" for the rest of the
compiler,
which may lead to transformations generating code that violate an
assumed
invariant. E.g., a landing pad which has a branch to it instead of
being the
target of the "unwind" branch of an invoke instruction. This is an
assumed
invariant because a landing pad isn't code that's generated by the
user, but by
the compiler to convey metadata information, and thus cannot be
branched to
through normal code paths.
Moreover, the "filter types" - those types that the caller function
can throw -
are only encoded in the `llvm.eh.selector' intrinsic. They aren't part
of the
callee function, which makes their presence in this intrinsic confusing,
counter-intuitive, and hard to get at.
Worse yet, the personality function is only encoded on these invoke
statements.
However, they apply to the function as a whole. The optimizers
shouldn't inline
functions with a different personality into a function. This is
especially a
problem for LTO.
Exception Handling Proposal
---------------------------
My goals with this new exception handling proposal are:
1. create a robust exception handling model that's intuitive and
represented
in the llvm IR,
2. be able to follow the exception handling ABI (outlined in the
"Itanium C++
ABI: Exception Handling" document) without being tied to any
specific
exception format (EH tables, DWARF, etc.),
3. hold off the generation of metadata as late as possible during
code
generation, and
4. use the `unwind' instruction to throw or rethrow instead of a
call to
`_Unwind_Resume_or_Rethrow'.
To achieve these goals, we'll need these new llvm instructions and
intrinsics.
New Intrinsics
--------------
llvm.eh.filter:
First the intrinsics. The first one is similar to Duncan's idea of a
filter
intrinsic. In fact, it's named the same. ;-)
void llvm.eh.filter(i8*, ...)
If present in the entry block, it enumerates all of the types that the
function
may throw. If `llvm.eh.filter(i8* null)', then the function cannot
throw, but
must still have an exception handling table generated for it. It
generates no
executable code.
An alternative is to add this information to the function's definition:
define void @foo() filters[i8* _ZTIi, i8* _ZTIKc] {
;; ...
}
or similar. This could allow optimizations based on knowing that a
function
cannot throw a particular type. However, it's not a particularly
attractive
solution.
llvm.eh.personality:
The next intrinsic is for the "personality function". The reason to
separate
this from the `convoke' instruction is because we want to prevent
inlining of a
function with a different personality function.
void llvm.eh.personality(i8*)
This also lives in the entry block for ease of finding. As with the
filters,
it may be beneficial to add this to the function's definition.
Convoke: A New Instruction
--------------------------
Syntax:
Now the instruction. I call it `convoke' (the name is subject to
change). The
general form of the instruction is:
convoke void @func()
to label %normal
with catches
[i8* @CatchTy1, label %catch.1],
[i8* @CatchTy2, label %catch.2],
...
[i8* @CatchTyN, label %catch.n],
[..., label %CatchAll]
If a catch-all block wasn't specified, then we generate:
[..., unwind]
indicating that we should unwind out of the function if the type
wasn't caught.
An alternative syntax is:
[i8* null, unwind]
It is an error to have two or more catch clauses with the same type.
Exception Object:
We specify the exception object before the jump to the catch blocks. For
example:
.---------.
| convoke |
`---------'
|
v
.-----------------------.
| |
v |
%normal .----------------+---------------.
| | ... |
v v v
select.1 = 1 select.2 = 2 select.n = n
| | |
`----------------+---------------'
|
v
.----------------------------------------.
| %sel = phi [%select.1, ..., %select.n] |
| %eh_ptr = llvm.eh.exception() |
| switch on %sel |
`----------------------------------------'
|
v
.--------------------.
| | ... |
v v v
%catch.1 %catch.2 %catch.n
Handling Cleanups:
Cleanup code needs to be executed before any of the catches. We can
accomodate
this easily - thanks to Eric! Basically, the idea is to jump to blocks
that
identify which catch they mean to target, they set a value to be used
in a
switch statement generated after the cleanup, execute the cleanup
code, and then
switch on the values to the actual catch blocks.
.---------.
| convoke |
`---------'
|
v
.-----------------------.
| |
v |
%normal .----------------+---------------.
| | ... |
v v v
select.1 = 1 select.2 = 2 select.n = n
| | |
`----------------+---------------'
|
v
.----------------------------------------.
| %sel = phi [%select.1, ..., %select.n] |
| %eh_ptr = llvm.eh.exception() |
| [cleanup code] |
| switch on %sel |
`----------------------------------------'
|
v
.--------------------.
| | ... |
v v v
%catch.1 %catch.2 %catch.n
(I love ASCII art!)
Throw: A New Instruction
------------------------
The `unwind' instruction has no semantic meaning outside of an exception
context. I propose removing it as an instruction, and replacing it
with a new,
more descriptively named instruction called `throw'. The `throw'
instruction
would take the exception object as its only parameter. Its semantic
would be to
throw that exception object. Essentially, rethrowing that exception.
Syntax:
throw i8* %eh_ptr
Good Things About This Method
-----------------------------
Exception handling metadata is no longer encoded in the CFG. It is
much more
sanely specified, and thus easier to understand by normal humans. The
optimizers
are free to modify the code as they see fit. In fact, they may be able
to do a
better job at it. For instance, they could perform optimizations
mixing the
cleanup and catch code. If the "filters" were part of the function
instead of an
intrinsic, there is the potential for optimizations based upon
knowledge that a
function cannot throw a particular type.
Inlining inside of a cleanup or catch block will no longer result in
branching
to a landing pad from a non-invoke instruction, because there are no
landing
pads to mess up. The "filter" and "personality" intrinsics maintain
information
important to proper EH semantics even if the catch clauses are removed.
There is no longer a need for an explicit call to
`_Unwind_Resume_or_Rethrow',
but we use the `unwind' instruction.
And, because the exceptions are explicit, there is no need for an
artificial
catch-all to be inserted into the generated code and EH table.
Bad Things About This Method
----------------------------
It's not a small change. It requires new instructions, which requires
teaching
everything about them. The good news is that they will behave very
similarly to
the current `invoke' and `unwind' instructions, so we can build upon
that work
touching similar places in the code. Also, existing bitcode files
won't benefit
from the new instructions. The code will have to be recompiled. This
shouldn't
be a major problem for most people, because bitcode files apart from
LTO files
are meant mostly for compiler developers.
-bw
More information about the llvm-dev
mailing list