[LLVMdev] RFC: Exception Handling Proposal II
Duncan Sands
baldrick at free.fr
Thu Nov 25 03:03:43 PST 2010
Hi Renato,
> On 25 November 2010 07:51, Duncan Sands<baldrick at free.fr> wrote:
>> If you got to XYZ because an instruction threw an exception before %x was
>> defined, then in XYZ you are using %x which was never defined. In effect
>> the definition of %x in bb does not dominate the use in XYZ. I think the
>> solution is to say that in XYZ you are not allowed to use any values defined
>> in bb: in the dominator tree, bb is not considered to dominate XYZ.
>
> Hi Duncan,
>
> I don't see how you can have dominance between a normal block and a
> cleanup block. Clean-up landing pads should never use user code (since
> they don't exist in userland).
I don't understand what you are saying. Cleanups (e.g. destructors)
can execute arbitrary user code, access arbitrary local variables etc.
For example, you can pass the address of a local variable to a class
which reads that value of that variable in a destructor etc. Note also
that LLVM is not just used by C++, it is also used by Ada which makes huge
(and subtle) use of exception handling, and doesn't always work the same as
C++. For example, throwing an exception in a destructor does not terminate
a program in Ada.
> Catch landing pads, on the other hand, have the same dominance
> relationship that the rest of user code has (and the same problems).
> Since you should never branch to XYZ under normal circumstances, you
> should never rely on its predecessor's values anyway. That's the whole
> point of having @llvm.eh.exception and @llvm.eh.selector, as it's the
> role of the personality routine to pass information between the user
> code and unwinding code.
I don't get what you are talking about here. You can access any variables
you like, whether local or global, in a catch handler. They are not passed
to the handler via llvm.eh.exception or llvm.eh.selector, they are simply
accessed directly (the unwinder restores registers etc making this possible).
Anyway, I'm not talking about what users should or shouldn't do, I'm talking
about fundamental rules for LLVM IR like "definitions must dominate uses".
What does this rule mean exactly and why does it exist? It is actually
fundamental to SSA form and is what makes the whole thing work. For example,
beginners to LLVM often ask how to get the RHS in "%x = icmp i32 %a, %b". Of
course there is no right-hand side because in SSA form the value of %x cannot
change and (this is the important bit for this discussion) %x *is never used
before it is defined*. Thus there is no point in distinguishing between %x
and the RHS, %x *is* the RHS. I'm pointing out that if the invoke instruction
is removed and catch information is attached to entire basic blocks, then if no
care is taken then it is perfectly possible to use %x before it is defined as
explained in my previous email, blowing up the entire LLVM system. Clearly the
solution is to not allow this by not allowing values defined in a basic block
to be used in a handler for that block; this in turn means that basic blocks
cannot be considered to dominate their handlers even if the only way to get
to the handler is via that basic block; this in turn means that all kinds of
transforms that much around with basic blocks (eg: SimplifyCFG) need to be
audited to make sure they don't break the new rule. And so on.
> In essence, in compiler generated landing pads, you should never
> generate a use of user values. But if XYZ is user code, it's user
> problem. ;)
The compiler crashing is a compiler problem, and that's exactly what is going
to happen if care is not taken about such details as dominance.
>
> cheers,
> --renato
>
> PS: abnormal cases like throwing on a destructor when previously
> thrown inside a constructor leads to termination, so even if you "use"
> the value in the catch area, you won't get there anyway. ;)
In Ada you can throw and exception inside a destructor and it does not lead
to program termination.
Ciao,
Duncan.
More information about the llvm-dev
mailing list