patch for functions with indirect calls and always_inline attribute

Tue Apr 15 21:19:26 PDT 2014

Someone said "WebKit" so I had to jump in. ;-)

WebKit never uses an interpreter implemented in C unless we're in portability mode on eclectic hardware.  The performance of this isn't too important.

You're most likely to find indirect branches inside functions that implement interpreter loop bodies.  Inlining these functions would be guaranteed to break the interpreter.  Hence, the restriction needs to be there in some form.

Also, these functions are huge and are called infrequently - once you call them you tend to execute inside of them without popping out for a long time.  It would probably be borderline wrong to mark an interpreter loop as always_inline just for the amount of code bloat you'd get.  Also interpreter loop bodies will almost assuredly fail any "should I inline this?" heuristics because of size.  So, not only would inlining break interpreters but it would likely never be profitable.

The correcntess issue is as follows.  The only idiom I am aware of for writing an interpreter using computed goto involves allowing pointers to the basic blocks to escape.  Here's the overall structure of a typical threaded interpreter:

void** opcode_table;

void interpreter(bool configureTheOpcodeTable, ...other arguments) {
    if (configureTheOpcodeTable) {
        opcode_table = malloc(sizeof(void*) * ...);
        opcode_table[0] = &&op_something;
        opcode_table[1] = &&op_somethingElse;
        ...
        return opcode_table;
    }

    ... // the actual interpreter goes here.
op_something:
    ... // implementation of op_something
    goto *(pc[stuff]); // jump to the next instruction
op_somethingElse:
    ... // and so on
}

The workflow of the interpreter is then to call interpreter(true); during initialization to get the opcode_table and all other calls pass false.  Notice that this relies on allowing the basic block pointers to escape.  If you inline the interpreter, you break the world.

There is no other good way to write an interpreter using computed goto.  The whole point of using computed goto is that you avoid a switch table in the sense that the bytecode (or AST, or whatever) *is* the jump table.  But since the interpreter is the consumer of the bytecode, the thing that produces the bytecode needs to reach into the interpreter and yank out its block pointers.  One of the silly aspects of the computed goto feature is that it requires this kind of tomfoolery to get the block addresses and none of it is particularly obvious from the compiler's standpoint.

It's reasonable to assume that nobody who would write such an interpreter would mark it always_inline.  So, it seems like a fair compromise to say that:

- If you didn't mark the function always_inline then inlining is disabled to protect the "escape pointers to blocks" interpreter idiom.

- If you did mark the function always_inline then you get what you asked for.

-Phil

On April 14, 2014 at 9:15:58 PM, Hal Finkel (hfinkel at anl.gov) wrote:

----- Original Message -----  
> From: "Eric Christopher" <echristo at gmail.com>  
> To: "Hal Finkel" <hfinkel at anl.gov>  
> Cc: "llvm-commits" <llvm-commits at cs.uiuc.edu>, "Gerolf Hoflehner" <ghoflehner at apple.com>  
> Sent: Monday, April 14, 2014 11:12:08 PM  
> Subject: Re: patch for functions with indirect calls and always_inline attribute  
>  
> On Mon, Apr 14, 2014 at 9:07 PM, Hal Finkel <hfinkel at anl.gov> wrote:  
> > ----- Original Message -----  
> >> From: "Eric Christopher" <echristo at gmail.com>  
> >> To: "Hal Finkel" <hfinkel at anl.gov>  
> >> Cc: "llvm-commits" <llvm-commits at cs.uiuc.edu>, "Gerolf Hoflehner"  
> >> <ghoflehner at apple.com>  
> >> Sent: Monday, April 14, 2014 11:00:21 PM  
> >> Subject: Re: patch for functions with indirect calls and  
> >> always_inline attribute  
> >>  
> >> On Mon, Apr 14, 2014 at 8:39 PM, Hal Finkel <hfinkel at anl.gov>  
> >> wrote:  
> >> > ----- Original Message -----  
> >> >> From: "Eric Christopher" <echristo at gmail.com>  
> >> >> To: "Gerolf Hoflehner" <ghoflehner at apple.com>  
> >> >> Cc: "llvm-commits" <llvm-commits at cs.uiuc.edu>  
> >> >> Sent: Monday, April 14, 2014 6:11:06 PM  
> >> >> Subject: Re: patch for functions with indirect calls and  
> >> >> always_inline attribute  
> >> >>  
> >> >> On Mon, Apr 14, 2014 at 4:09 PM, Gerolf Hoflehner  
> >> >> <ghoflehner at apple.com> wrote:  
> >> >> > Please see below.  
> >> >> >  
> >> >> > On Apr 14, 2014, at 2:56 PM, Eric Christopher  
> >> >> > <echristo at gmail.com>  
> >> >> > wrote:  
> >> >> >  
> >> >> >> I could be down for this...  
> >> >> >>  
> >> >> >> Do you have any numbers?  
> >> >> > No. I ran into this when creating a test case for another bug  
> >> >> > fix.  
> >> >>  
> >> >> OK, would you mind running some performance numbers on this? It  
> >> >> passes  
> >> >> my sniff test as well, but I'd like to not be wrong :)  
> >> >>  
> >> >> If the numbers come out neutral/ahead/no change feel free to  
> >> >> commit.  
> >> >  
> >> > And if they don't? always_inline means *always inline* -- this  
> >> > is a  
> >> > correctness issue, and we should do it regardless of performance  
> >> > considerations.  
> >> >  
> >>  
> >> Turns out I agree. However, I'd like to know if we've got any  
> >> regressions so they can be explored first and we can figure out  
> >> what's  
> >> wrong and then apply. I wouldn't expect anything, but running  
> >> numbers  
> >> would make sure that we're not exposing some pernicious  
> >> inliner/opt  
> >> bug somewhere else.  
> >  
> > Fair enough.  
> >  
> >>  
> >> > I am curious, however, why we have this restriction at all.  
> >> > Indirect dispatch is expensive, compounding that expense with  
> >> > the  
> >> > call overhead of a small function seems suboptimal; plus, for  
> >> > hardware that has indirect branch prediction (Intel, ARM, and  
> >> > others), inlining should give the branch predictor a better  
> >> > chance  
> >> > at doing the right thing.  
> >> >  
> >>  
> >> We do have a bonus to the inliner for just that IIRC. This should  
> >> be  
> >> for something like computed branches within functions rather than  
> >> calls to virtual functions? That said, it does seem odd since the  
> >> branch predictor should help with those as well.  
> >  
> > Good point. So this, in C, is for things like dispatch tables using  
> > the labels-as-values extension; anything else?  
> >  
>  
> I imagine they found this in the webkit interpreter or something like  
> that which uses dispatch tables.  

Right, and I suppose they want them inlined ;) -- which makes me wonder whether we should, as a follow-up, benchmark removing the restriction all together.  

-Hal  

>  
> -eric  
>  
> > -Hal  
> >  
> >>  
> >> *shrug*  
> >>  
> >> -eric  
> >>  
> >> > -Hal  
> >> >  
> >> >>  
> >> >> Thanks!  
> >> >>  
> >> >> -eric  
> >> >>  
> >> >> >>  
> >> >> >> Nits on the patch:  
> >> >> >>  
> >> >> >> // Disallow inlining of functions which contain an  
> >> >> >> indirect  
> >> >> >> branch.  
> >> >> >> - if (isa<IndirectBrInst>(BI->getTerminator()))  
> >> >> >> + if (isa<IndirectBrInst>(BI->getTerminator()) &&  
> >> >> >> + !F.hasFnAttribute(Attribute::AlwaysInline))  
> >> >> >>  
> >> >> >> Update the comment.  
> >> >> > Ok.  
> >> >> >> I'm assuming this can't be hoisted because of the  
> >> >> >> returns_twice aspect yes?  
> >> >> >  
> >> >> > Yes, for now I only wanted to address indirect branches and  
> >> >> > leave  
> >> >> > remaining checks as they are.  
> >> >> >  
> >> >> >>  
> >> >> >> -define i32 @inner5(i8* %addr) alwaysinline {  
> >> >> >> +define i32 @inner5(i8* %addr) {  
> >> >> >>  
> >> >> >> Why the change?  
> >> >> > This is to keep the old test behavior which relied on *not*  
> >> >> > inlining inner5 because of an indirect branch.  
> >> >> >>  
> >> >> >> -eric  
> >> >> >>  
> >> >> >> On Mon, Apr 14, 2014 at 2:49 PM, Gerolf Hoflehner  
> >> >> >> <ghoflehner at apple.com> wrote:  
> >> >> >>> Hi  
> >> >> >>>  
> >> >> >>> llvm currently does not inline functions with indirect  
> >> >> >>> branches.  
> >> >> >>> We had some  
> >> >> >>> internal discussion and think the always_inline attribute  
> >> >> >>> should  
> >> >> >>> overrule  
> >> >> >>> that restriction.  
> >> >> >>>  
> >> >> >>> Gerolf  
> >> >> >>>  
> >> >> >>>  
> >> >> >>>  
> >> >> >>>  
> >> >> >>>  
> >> >> >>> Index: lib/Analysis/IPA/InlineCost.cpp  
> >> >> >>> ===================================================================  
> >> >> >>> --- lib/Analysis/IPA/InlineCost.cpp (revision 206204)  
> >> >> >>> +++ lib/Analysis/IPA/InlineCost.cpp (working copy)  
> >> >> >>> @@ -1301,7 +1301,8 @@  
> >> >> >>> Attribute::ReturnsTwice);  
> >> >> >>> for (Function::iterator BI = F.begin(), BE = F.end(); BI  
> >> >> >>> !=  
> >> >> >>> BE;  
> >> >> >>> ++BI) {  
> >> >> >>> // Disallow inlining of functions which contain an  
> >> >> >>> indirect  
> >> >> >>> branch.  
> >> >> >>> - if (isa<IndirectBrInst>(BI->getTerminator()))  
> >> >> >>> + if (isa<IndirectBrInst>(BI->getTerminator()) &&  
> >> >> >>> + !F.hasFnAttribute(Attribute::AlwaysInline))  
> >> >> >>> return false;  
> >> >> >>>  
> >> >> >>>  
> >> >> >>>  
> >> >> >>> for (BasicBlock::iterator II = BI->begin(), IE =  
> >> >> >>> BI->end();  
> >> >> >>> II != IE;  
> >> >> >>> Index: test/Transforms/Inline/always-inline-attribute.ll  
> >> >> >>> ===================================================================  
> >> >> >>> --- test/Transforms/Inline/always-inline-attribute.ll  
> >> >> >>> (revision  
> >> >> >>> 0)  
> >> >> >>> +++ test/Transforms/Inline/always-inline-attribute.ll  
> >> >> >>> (working  
> >> >> >>> copy)  
> >> >> >>> @@ -0,0 +1,26 @@  
> >> >> >>> +; RUN: opt < %s -O3 -S | FileCheck %s  
> >> >> >>> + at gv = external global i32  
> >> >> >>> +  
> >> >> >>> +define i32 @main() nounwind {  
> >> >> >>> +; CHECK-NOT: call i32 @foo  
> >> >> >>> + %1 = load i32* @gv, align 4  
> >> >> >>> + %2 = tail call i32 @foo(i32 %1)  
> >> >> >>> + unreachable  
> >> >> >>> +}  
> >> >> >>> +  
> >> >> >>> +define internal i32 @foo(i32) alwaysinline {  
> >> >> >>> + br label %2  
> >> >> >>> +  
> >> >> >>> +; <label>:2 ; preds  
> >> >> >>> =  
> >> >> >>> %8,  
> >> >> >>> %1  
> >> >> >>> + %3 = phi i32 [ %0, %1 ], [ %10, %8 ]  
> >> >> >>> + %4 = phi i8* [ blockaddress(@foo, %2), %1 ], [ %6, %8 ]  
> >> >> >>> + %5 = icmp eq i32 %3, 1  
> >> >> >>> + %6 = select i1 %5, i8* blockaddress(@foo, %8), i8* %4  
> >> >> >>> + %7 = add nsw i32 %3, -1  
> >> >> >>> + br label %8  
> >> >> >>> +  
> >> >> >>> +; <label>:8 ; preds  
> >> >> >>> =  
> >> >> >>> %8,  
> >> >> >>> %2  
> >> >> >>> + %9 = phi i32 [ %7, %2 ], [ %10, %8 ]  
> >> >> >>> + %10 = add nsw i32 %9, -1  
> >> >> >>> + indirectbr i8* %6, [label %2, label %8]  
> >> >> >>> +}  
> >> >> >>> Index: test/Transforms/Inline/always-inline.ll  
> >> >> >>> ===================================================================  
> >> >> >>> --- test/Transforms/Inline/always-inline.ll (revision  
> >> >> >>> 206203)  
> >> >> >>> +++ test/Transforms/Inline/always-inline.ll (working  
> >> >> >>> copy)  
> >> >> >>> @@ -78,7 +78,7 @@  
> >> >> >>> ret i32 %add  
> >> >> >>> }  
> >> >> >>>  
> >> >> >>>  
> >> >> >>>  
> >> >> >>> -define i32 @inner5(i8* %addr) alwaysinline {  
> >> >> >>> +define i32 @inner5(i8* %addr) {  
> >> >> >>> entry:  
> >> >> >>> indirectbr i8* %addr, [ label %one, label %two ]  
> >> >> >>>  
> >> >> >>>  
> >> >> >>> _______________________________________________  
> >> >> >>> llvm-commits mailing list  
> >> >> >>> llvm-commits at cs.uiuc.edu  
> >> >> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits  
> >> >> >>>  
> >> >> >  
> >> >> _______________________________________________  
> >> >> llvm-commits mailing list  
> >> >> llvm-commits at cs.uiuc.edu  
> >> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits  
> >> >>  
> >> >  
> >> > --  
> >> > Hal Finkel  
> >> > Assistant Computational Scientist  
> >> > Leadership Computing Facility  
> >> > Argonne National Laboratory  
> >>  
> >  
> > --  
> > Hal Finkel  
> > Assistant Computational Scientist  
> > Leadership Computing Facility  
> > Argonne National Laboratory  
>  

--  
Hal Finkel  
Assistant Computational Scientist  
Leadership Computing Facility  
Argonne National Laboratory  
_______________________________________________  
llvm-commits mailing list  
llvm-commits at cs.uiuc.edu  
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140415/92fca28d/attachment.html>