[LLVMdev] LLVM Exception Handling

Sat Sep 25 15:46:45 PDT 2010

Hi guys,

I have begun a modification to the invoke/unwind instructions. The following
.ll file demonstrates the change.

define i32 @v(i32 %o) {

  %r = icmp eq i32 %o, 0
  br i1 %r, label %raise, label %ok

ok:
  %m = mul i32 %o, 2
  ret i32 %m

raise:
  %ex = inttoptr i32 255 to i8 *

  ; unwind now takes an i8* "exception" pointer
  unwind i8* %ex
}

define i32 @g(i32 %o) {
entry:
  ; invoke produces a different value depending on whether if
  ; branches to the success case or the failure case.
  %s = invoke i32 @v(i32 %o) to label %ok
   unwind %x to label %catch
ok:
  ret i32 %s

catch:
  %v = ptrtoint i8 * %x to i32
  %r = icmp eq i32 %v, 255
  br i1 %r, label %bad, label %worse
bad:
  ret i32 -1

worse:
  ret i32 -2
}

With my current change, the unwind instruction is able to pass a value to
the unwind branch of the invoke instruction. I was able to coax LLVM into
generating correct code for this using the LowerInvoke pass generating
expensive but correct code via setjmp/longjmp.

The unwind instruction now takes a single i8* parameter. This value is
propagated to the nearest invoke instruction that generated the call to the
function containing the unwind instruction.

The invoke instruction now generated one of two different values depending
on how the call exits. If the call exits via a return instruction, the
invoke instruction generates a return value (denoted by %s in the sample
code). If the call exits via an unwind instruction, the invoke generates an
exception value (denoted by %x in the sample code). The return value is only
valid if the invoke branches to the return branch. The exception value is
only valid if the invoke instruction branches to the unwind branch.

For sources that are not attempting to integrate into a third parting
exception handling mechanism (gcc, or SEH), this would be enough to
implement exception handling. When integrating into external exception
handling mechanisms, the "exception" value generated from the invoke
instruction would replace the call the 'eh.exception' intrinsic, and would
have the benefit of making it much easier for analysis passes
to associate this value with the invoke that generated it. For the unwind,
if all thats needed is an exception pointer than an unwind instruction could
be used, and lowered to the appropriate runtime library.

To make this work, the fundamental concept that an instruction always
produces a single value needs to change. This concept was already somewhat
violated by the invoke instruction since if it branched to the unwind block,
the return value was not actually generated. But in its existing form, it
looks like it only generates one value. As far as SSA is concerned, I don't
see any problem with an operation generating multiple values under different
circumstances since there is still only one source for any value. As long as
the block being branched to dominates any usage of the respective value I
think its correct and optimizations should be able to perform correctly.

Unfortunately, the fact that a value and the instruction that generates it
are one and the same makes it very difficult to generate a representation
where a single instruction can generate more that one value. My current
solution (which feels wrong) is to have the invoke instruction own an
additional "exception" value that represents the value that is generated
when continuing via the unwind branch. This value is quite different from
other values and therefore inherits directly from llvm::Value. When lowering
the invoke instruction the LowerInvoke pass replaces usage of this
"exception" value with the return value of the setjmp call after is has been
determined that the setjmp returned from a longjmp. When lowering the unwind
instruction the LowerInvoke pass puts the argument to the unwind instruction
as the value parameter to the longjmp call.

While the lowering of this representation seemed natural, parsing it has
proven difficult. This "exception" value must be in the functions symbol
table, but in the current structure of the parser, the name of the
instructions value is not and cannot be set until after it has been added to
the containing basic block. The problem is that at that point, the parser
doesn't know that the instruction produces another value, and even if it
did, it has lost the needed information to properly register the name with
the symbol table. To get past this point, I put a nasty hack in place. I
gave LLParser permission to see the internals of instruction so I could
temporarily assign the invoke instructions parent pointer ahead of time so
that the call to setName on the "exception" value could succeed. Once this
value is in the symbol table, there is currently no way to get it out. The
code that removes an Instruction's entry from the symbol table is unaware of
the additional value that needs to be removed. This causes a seemingly
benign assertion at shutdown about the symbol table not being empty.

Bitcode I/O is also another problem, in my current build, it is broken.
There is currently no way to bind to the "exception" value of the invoke
instruction. I have yet to look into this in any way as it was not needed to
get my sample code through to the code generator.

In closing, I am looking for some feedback as the whether this approach
makes sense. I would also like to know if anyone has any suggestions on how
to deal with some of the issues. I have included a patch with the changes I
have made so far. It is still very rough but I though it might be usefull.

-Nathan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20100925/8c221640/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: exception.patch
Type: application/octet-stream
Size: 24705 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20100925/8c221640/attachment.obj>