[LLVMdev] Exception Handling Proposal - Second round

Tue May 17 15:16:49 PDT 2011

Hi all,

Following John's, Duncan's and Bill's proposals about exception
handling, I thought I'd summarise what has been discussed so far.

 ** The problems we're trying to solve are:

 P1. Different languages have different EH concepts and IR needs to be
agnostic (as possible) about that
 P2. Inlining and optimisations (currently) destroy the EH semantics
and produce code that can't unwind
 P3. Clutter in the IR representation of EH leads to unnecessary
complexity when optimising or inlining
 P4. The back-end should have a simple and unified representation on
which to build (different) EH tables

 ** The key-facts I've collected after re-reading all emails are:

 F1. There are different families of EH: zero-cost, SjLj etc and they
should have similar IR representations
 F2. Back-ends should know how to implement them or bail out (thus,
representation should be *clear*)
 F3. Optimisations should make sure unwinding and normal flow do not overlap
 F4. Avoid artificial impositions on basic-block behaviour and
dependency to simplify optimisations
 F5. We *must* keep the unwind actions and the order in which they
execute when inlining
 F6. Some instructions (such as divide in Java) can throw exceptions
without an explicit dispatch mechanism

There are two quasi-orthogonal proposals to change the EH mechanism:
 - Duncan Sands', regarding rules on how to protect the dispatch
mechanism (and preserve actions and their orders) when inlining or
optimising code, and
 - Bill Wendling's IR simplification using the "dispatch" mechanism to
better express unwinding flow and ease inlining and optimisations

 ** Proposal 1: Rules on how to protect the unwind flow (P2, F3, F4, F5)

Current LLVM inlining can create some unreachable blocks that get
optimised away (and shouldn't). Some languages demand that certain
clean-up areas must be executed, others that it must not. Some
libstdc++ code apparently relies on this implementation defined
behaviour. To solve this problem, work arounds were coded to redirect
flow to catch-all regions, that created other problems, etc.

Instead of running around in circles, the following rules must be
observed when inlining/optimising:
 - When inlining a dispatch area, the inlined block must resume to the
inlinee's dispatch block
 - If using eh.selector, inlining should append actions to inlinee's
selector block
 - Optimisers should not remove unwind actions nor change their
control flow (unless semantics is preserved)
 - If we allow changes, we need to explicitly describe the semantics
or have one to rule them all

 ** Proposal 2: Dispatch and basic-block markings (P3, P4, F5)

Replace the eh.selector/eh.typeid by a dispatch mechanism, that
explicitly lists the possible catch areas, filters, personality and
belongs to a basic block, that needs an attribute "landingpad" to help
optimisations understand that that block is special for EH (this might
not be strictly necessary).

The general syntax of the dispatch is:

lpad: landingpad
 %eh_ptr = tail call i8* @llvm.eh.exception()
 dispatch region label %lpad resume to label %unwind
   catches [
     %struct.__fundamental_type_info_pseudo* @_ZTIi, label %ch.int.main
   ]
   personality [i32 (...)* @__gxx_personality_v0]

This dispatch instruction is the last instruction in its block. It
explicitly belongs to that block ("region label %lpad") and resume
unwinding to label %unwind. It catches only INT exceptions (whatever
that means in the source language) and the personality routine that is
going to interpret it during run-time is __gxx_personality_v0.

When optimising, passes should see the catch/clean-up blocks that are
dominated by the lading pad and keep their natural flow. When
inlining, they should be move inside the inlinee and the the "resume
label" should be the inlinee's dispatch landing pad, so the sequence
of actions (and the actions themselves) is kept intact.

The dispatch call can also be attached to the invoke instruction,
though there were some problems with clean-ups (Bill) and it may
clutter the IR by repeating the same dispatch for many invokes in one
single try block.

I see that the %eh_ptr is not used by the dispatch, how does it know
what is the type of exception thrown?

 ** What was not covered

P1/F1/F2: Are these changes EH-style agnostic? Does it at least work
for Dwarf AND SjLj using the same IR representation? Do we want that
to happen?

F6: If a div instruction inside a basic block without EH unwind
information throws an exception, how does the IR represents that? Do
we create an invoke to a fake function for every instruction that
could throw? Do we put the unwind information in the basic-block? In
the dispatch instruction (like we do for region label)?

 ** Amount of work to do

I reckon that both changes can be done at the same time. Current work
is being done in the ARM back-end to support EHABI, which should also
be orthogonal to those changes (Anton?).

The inlining changes can be done at any time, no need to change the IR
or anything and the changes can be reused by the second proposal later
on.

The problem is that, to change the IR representation, we need to
change all front-ends that deal with exception handling (clang,
llvm-gcc, ada, python etc), and make the back-end iteratively more
robust to accept the new format, but it'd be hard to quickly
deactivate the old format.

I've seen this thread show up and die a few times, and I'm not sure we
have a pressure to do this at any given time. Do we?

cheers,
--renato