[LLVMdev] Two labels around one instruction in Codegen

Wed Nov 7 01:08:11 PST 2007

Hi Nicolas,

> >> In order to have exceptions for non-call instructions (such as sdiv,
> >> load or stores), I'm modifying codegen so that it generates a BeginLabel
> >> and an EndLabel between the "may throwing" instruction. This is what the
> >> codegen of an InvokeInst does.
> >>     
> >
> > the rule is that all instructions between eh begin labelN and eh end labelN
> > must unwind to the same landing pad.  This is why invokes are bracketed by
> > such labels.  There are also two other cases to consider: (1) potentially
> > throwing instructions which are not allowed to throw (nounwind), 
> 
> What do you mean "not allowed"? Is this decided by the front-end?

yes, it is decided by the front-end.  In C++ some constructs are not allowed
to throw (eg: a destructor that is run while unwinding some other exception)
and must result in a call to terminate.  Sometimes this can be implemented
by simply wrapping the construct in a try-catch block that calls terminate
if any exception is thrown.  But there are obscure situations in which this
can't be done (I forget why - I can look into it again if you like), in
which case the C++ runtime unwinder takes care of it.  However for the
unwinder to do it properly, there need to be special markings in the unwind
tables, saying "this call is not allowed to throw".

> Or by 
> an optimization pass (div may throw, but if we have a = b / 5 we not it
> won't throw).

No, that's a different issue.

> > (2) throwing
> > instructions for which any thrown exception will not be processed in this
> > function. 
> 
> I'm not sure I understand here.

For example:
int f(int n) { return 1/n; }
Here the instruction may throw an exception.  But there is no handler for it
in function f.  However there may be a handler further up the call stack.

> >  In case (1) the instruction should have no entry in the final
> > dwarf exception table, while in case (2) it should have an entry.  We don't
> > handle (1) right now, however the plan is that nounwind calls will also be
> > bracketed by labels but will have no associated landing pad. 
> 
> Why would they be bracketed by labels if codegen knows they don't throw?

No, this is when they may throw but they're not allowed to according to the
language semantics, i.e. if they do throw the language runtime wants to know
about it and take special action.  The labels and special entry in the exception
table exist to tell the runtime that special action should be taken if an
exception is propagated by an instruction between the labels.

> >  As for (2),
> > the dwarf writer scans all instructions in the function and if it sees a
> > call that is not bracketed by labels then it generates an appropriate entry
> > in the exception table 
> 
> Do you mean "that _is_ bracketed by labels" ?

No, that is *not* bracketed by labels.  It is a strange feature of C++ exception
handling that any call that has no entry in the exception table is considered to
not be allowed to throw exceptions (see (1) above), and if it does throw an
exception then the runtime will call terminate.  That means that all ordinary
calls need to have an entry in the exception table.  The result is (since we don't
handle (1) yet) that *all* calls end up with entries in the exception table.
However, in order to avoid putting gazillions of labels everywhere (i.e. around
every call), we only out labels around invokes.  If the dwarf writer sees a call
bracketed by labels it understands that this is an invoke; if it sees a call that
is not bracketed by labels it understands that this is an ordinary call, and
generates an appropriate entry in the exception table.

> > (this will of course need to be modified to consider
> > all throwing instructions - note that this means that "maythrow" markings will
> > have to exist right to the end of code generation!); it is done this way
> > because labels inhibit optimizations (we used to bracket all calls with
> > labels, but stopped doing that because of the optimization problem).  I'm
> > mentioning this because the begin and end labels are not *between* maythrow
> > instructions, they bracket them.
> >
> >   
> 
> Sure, that would be the goal. Which means the labels are not created
> between an instruction, but between the instructions of a basic block.
> I'll see if this works. My first implementation was between one
> instruction because it was very simple to copy the invoke case for
> non-calls.

If all instructions in a basic block unwind to the same place then it is
indeed enough to put a label at the beginning and end of the block.

> >> However, when generating native code, only BeginLabel is generated, and
> >> it is generated after the instruction. I'm not familiar with DAGs in the
> >> codegen library, so here are my 2-cents thoughts why:
> >>
> >> 1) BeginLabel and EndLabel are generated with:
> >>   DAG.setRoot(DAG.getNode(ISD::LABEL, MVT::Other, getRoot(),
> >>                             DAG.getConstant({Begin|End}Label, MVT::i32)));
> >>
> >> This seems to work with InvokeInst instructions, because the root of the
> >> DAG is modified by the instruction. With instructions such as sdiv, the
> >> root is not modified: the instruction only lowers itself to:
> >> DAG.getNode(OpCode, Op1.getValueType(), Op1, Op2)
> >>     
> >
> > I think that not creating a new root means that the instruction is allowed
> > to be re-ordered with respect to other instructions, as long as it occurs
> > before its uses.  Re-ordering is rather dubious for instructions that may
> > throw, though it's not clear what is acceptable.  I think you probably need
> > a new selection DAG "throw" node which you wrap throwing instructions in, a
> > bit like a TokenFactor.  This throw node would be setup in such a way as to
> > be bracketable by labels.
> >
> >   
> 
> I need to get some LLVM code reading ;-)
> 
> >> Which probably makes the codegen think EndLabel and BeginLabel are in
> >> the same place
> >>     
> >
> > In that case I would expect them both to be deleted...
> >   
> 
> Only one was deleted. Consider the code:
> 
> define i32 @test(i32 %argc) {
> entry:
>         %tmp2 = sdiv i32 2, %argc       to label %continue unwind to
> label %unwindblock ; <i32> [#uses=1]
> 
> continue:
>         ret i32 %tmp2
> 
> unwindblock:
>         unwind
> }
> 
> 
> And here is the resulting x86 code (Llabel1 was supposed to be before
> the {ctld, idvl} and Llabel2 which was after is not generated)
> 
> test:
> .Leh_func_begin1:
>           
> .Llabel4:
>         movl    $2, %eax
>         movl    4(%esp), %ecx
>         cltd
>         idivl   %ecx
>           
> .Llabel1:
> .LBB1_1:        # continue
>         ret
> .LBB1_2:        # unwindblock

OK, I may take a look if I can find time (hah!).

Ciao,

Duncan.