[LLVMdev] RFC: Exception Handling Proposal II

Sun Nov 28 14:14:01 PST 2010

On Nov 28, 2010, at 6:23 AM, Renato Golin wrote:

> There still seems to be a confusion between clean-ups and catch areas.
> What you both describe are catch areas, on which your arguments
> (AFAICS) are perfectly valid. The distinction is between catch and
> clean-up areas.
> 
> You would never print the value of %x in a clean-up area. The sole
> purpose of clean-up areas is to cleanly destroy variables that
> wouldn't otherwise because of exception handling. During normal flow,
> the destruction code doesn't need to be in a clean-up area, it can
> easily be at the end of the scope, and it normally is on both places.
> 
> The destruction code itself can print the value of %x (if it has
> access to it), and the validity of such value in the destructor code
> is up to the language + the user code. For instance, accessing a null
> pointer in C++ is allowed by the language (nobody stops you from doing
> so in compile time) but it's illegal during execution on most
> platforms.
> 
> But, under no circumstances, a clean-up area can access a user
> variable to print it on the screen. It's like calling an intrinsic and
> expecting it to print the value of a random variable inside your code.
> It doesn't even make sense.
> 
> Catch areas, on the other hand, are user code. Like destructors, the
> user can print the value of %x if it has access to, and if the
> variable was never initialised, it's the user's problem of relying on
> such condition. Catch areas are NOT unwinding basic blocks, they are
> the first user code blocks that, in case of a match, it's where
> execution returns to normal flow. They can also throw again, and make
> the flow go back to unwinding, but per se, they're user code.
> 
> As was pointed out, some optimizations in LLVM can move user code to
> clean-up areas. The compiler may prove it valid and the execution
> might even work, but that's an artefact of how the compiler works and
> how other optimizations work around the same issue (such as inlining).
> 
> Moving code from try to catch areas (and vice-versa) is fine, as both
> are user blocks. But moving user code to clean-up areas can lead to
> undefined behaviour. For example, during the unwinding of several
> functions without a single match to the exception, local variables in
> all intermediate functions have to be cleaned up, and that's done via
> the clean-up areas. The personality routine is controlling this flow,
> so if you move user code that could have side effects to a clean-up
> area (say only on -O3), a perfectly valid unwinding can break
> completely and terminate.
> 
> That breaking is not inside the destructor, nor inside the catch
> areas, but inside a clean-up area, on which the user has no access nor
> control. This is a compiler bug.
> 
My confusion could be in what Duncan was talking about. If we have a basic block that may throw and is caught by some landing pad, what variables may be used in that landing pad – both the cleanup part and the catch handler? Certainly the cleanup cannot access the user variables directly (but can indirectly if a pointer is passed into an object which is then dereferenced by the d'tor). But the catch handlers are dominated by the landing pad, which would need to be dominated by the throwing basic block in order to use any values calculated in that basic block.

One possibility (perhaps this is what he meant) is that if a value is used in the catch handler, then it cannot reside in the throwing block.

-bw

[Quoting Duncan's original email here]

"I think everyone wants to get rid of invoke, but that is hard.  One problem
is that you want to keep the SSA property "definitions dominate uses".  Now
suppose you have a basic block

  bb: [when throws, branch to XYZ]
     ...
     %x = ... (define %x)
     ...

  XYZ:
     ...use %x...

If you got to XYZ because an instruction threw an exception before %x was
defined, then in XYZ you are using %x which was never defined.  In effect
the definition of %x in bb does not dominate the use in XYZ.  I think the
solution is to say that in XYZ you are not allowed to use any values defined
in bb: in the dominator tree, bb is not considered to dominate XYZ.

These kind of issues touch fundamental design points of LLVM, so need to be
dealt with carefully.

Ciao,

Duncan."