[LLVMdev] Questions on llvm and jit

Mon Mar 29 15:47:43 PDT 2010

Thanks all for your replies.  Invoking the JIT at runtime to recompile the .bc doesn't seem desirable (from either a cpu pathlength or mem perspective), as Reid insinuated.  Regarding the .so approach, one thing I failed to mention is that the runtime cost may, at times, not be significant enough (depends on the data) to warrant the overhead involved in the .so scheme.  But we should consider this when profile data suggests a long execution.

So just to summarize, our intended solution was to compile our expressions into single, stand-alone functions, package them up and send to our executor processes via the messaging layer, and then simply point and execute.  In some cases, it would be preferable to have calls within these expressions.  The *best* way to address this would be to either 1) rearchitect the solution to use .so instead, 2) Write these functions using llvm api at compile-time and then inline them at the call-sites, or 3) invoke the JIT at runtime and keep the functions in llvm allocated memory.

Question 1: can llvm intrinsic functions be inlined?

Question 2: do I have a similar problem for jump tables? (assume this is the only global we would have to contend with in our functions).

Any suggestions on how I can make this work without .so or invoking jit at runtime?

Thanks again.

- Shasank

> From: rnk at mit.edu
> Date: Fri, 26 Mar 2010 17:41:44 -0400
> Subject: Re: [LLVMdev] Questions on llvm and jit
> To: nhowell at ebay.com
> CC: shanko_chavano at hotmail.com; llvmdev at cs.uiuc.edu
> 
> If you distribute the .bc to a new machine and JIT it there, you
> wouldn't need to apply any relocations to call instructions, since the
> JIT will get all the offsets right when it does code generation on the
> node.
> 
> I would strongly recommend the shared library approach over this one,
> because JIT compile time is dominated by code generation, not
> optimization. If you distribute the .bcs, then you'll have to do
> instruction selection and register allocation all over again on each
> node.
> 
> The compile + link steps only need to be performed once per job to
> distribute. Presumably the jobs are running long enough that
> compiling and linking them is negligible. The only downsides of using
> a .so are that you have to load the library from the filesystem and
> you have to indirect globals through the GOT/PLT.
> 
> Reid
> 
> On Fri, Mar 26, 2010 at 5:16 PM, Howell, Nathan <nhowell at ebay.com> wrote:
> > If you really trying to avoid linking everything into a shared library
> > (easiest choice), consider splitting up the compilation into a few more
> > steps:
> >
> >
> >
> > 1) Compile and optimize once as part of your build, target LLVM bitcode
> > instead of machine code
> >
> > 2) Ship .bc files out to each node
> >
> > 3) Fix-up call instructions in a BasicBlockPass, run JIT without any/many
> > additional IR optimizations enabled
> >
> >
> >
> > This won’t get you around all the optimization passes and the like but it’s
> > still pretty fast and a lot easier than trying to patch a direct call in
> > machine code.
> >
> >
> >
> > -n
> >
> >
> >
> > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On
> > Behalf Of Shasank Chavan
> > Sent: Friday, March 26, 2010 1:52 PM
> > To: rnk at mit.edu
> > Cc: llvmdev at cs.uiuc.edu
> > Subject: Re: [LLVMdev] Questions on llvm and jit
> >
> >
> >
> > Hi Reid.  Thanks for your response.  We will be running this code in
> > different processes across different nodes.  Basically we have
> > thousands of "executor" processes that are solely responsible for executing
> > this code generated at compile-time across partitioned data.  Rather than
> > each one of these processes invoking the jit and compiling with full opts
> > and all of that, we believe it may be cheaper to do this once at
> > compile-time and have the processes simply point to it and execute.  As for
> > a dll, compiling, linking, sending over the dll to the different nodes, and
> > then calling into the dll with dlopen will not be cheap to do, and for that
> > reason we're avoiding it.
> >
> > So for all the reasons above, we prefer everything to be inlined where
> > possible, and, if need be, have a mechanism to fix up target addresses of
> > call instructions.  I'm more familiar with ia64 than x86, and so it's a
> > learning process for me to understand how easy it is to do this.  First off,
> > is there common code out there or in the dll that will allow me to walk
> > through x86 instructions in search for a call?
> >
> > Thanks again for your response.
> >
> > - Shasank
> >
> >> From: rnk at mit.edu
> >> Date: Fri, 26 Mar 2010 11:51:24 -0400
> >> Subject: Re: [LLVMdev] Questions on llvm and jit
> >> To: shanko_chavano at hotmail.com
> >> CC: llvmdev at cs.uiuc.edu
> >>
> >> On Tue, Mar 23, 2010 at 4:44 PM, Shasank Chavan
> >> <shanko_chavano at hotmail.com> wrote:
> >> > Hi.  I have more questions regarding llvm and using it as a jit for our
> >> > purposes.  Also, let me confess that I haven't actually used llvm yet
> >> > (I'm
> >> > still prototyping using gnu's libjit).  Some of the issues that have
> >> > come up
> >> > from that work so far leads to me these questions:
> >> >
> >> > 1) We intend to use llvm as a jit in our expression compiler at
> >> > compile-time
> >> > only.  At runtime, the x86 code generated from the bitcode compiler will
> >> > be
> >> > packed and sent to our runtime facility for execution.  This implies a
> >> > need
> >> > to fixup things like external function calls made in the function we
> >> > generated at compile-time.  So suppose a call to "memcpy" is made in our
> >> > function F.  The target address for memcpy will need to fixed up at
> >> > runtime
> >> > in the x86 code in memory to point to the actual memcpy available on the
> >> > system.  Obviously we may have problems if memcpy is bound to a
> >> > different
> >> > load module (i.e. c runtime), so we may instead fix up the call to point
> >> > to
> >> > a wrapper call that's bounded to the app.  The question is, is it
> >> > possible
> >> > to walk through the x86 instructions in search for "call" instructions
> >> > to
> >> > then fix up the target?  Does llvm have a facility to walk through the
> >> > actual machine code instructions to do this?
> >> >
> >> > 2) Similarly to (1), any external calls made by llvm (e.g. calls to
> >> > intrinsic functions) will either need to be fixed up (as discussed in
> >> > (1))
> >> > or prevented all-together.  Is there a way to inline all intrinsic calls
> >> > so
> >> > that no calls are made other than those purposefully inserted by me.
> >> > Similarly, if a branch table is generated for a switch statement, where
> >> > in
> >> > memory will this sit relative to the generated x86 code for the
> >> > function?
> >> > Will it be possible to copy both the function and the jump table
> >> > together
> >> > into our memory buffer for packing and shipping to our runtime facility?
> >> > Will there be wasted space?  Clearly the location of the jump table is
> >> > relevent in terms of ensuring that the branch target offsets are correct
> >> > within the function, and for that reason this all matters.
> >>
> >> Why do you need this? Where are you running the code being generated,
> >> ie what is your "runtime facility"? Is it another process on the same
> >> system or another machine on the network?
> >>
> >> In this case it might be easier for you to compile and link a shared
> >> library and then dlopen it in your runtime facility.
> >>
> >> > 3) I noticed with libjit that the reg-allocator and alias analysis
> >> > (assuming
> >> > this exists in libjit) is not the best.  I wonder if llvm will have a
> >> > problem.  If I make the appropriate llvm calls to redundandly load an
> >> > integer of a pointer several times (e.g. (goo == *a) .... ....  (foo ==
> >> > *a)), will llvm make every effort to reusue the temp created to store *a
> >> > initially in the second reference?  I tried this out with the llvm demo,
> >> > and
> >> > the C code generated automatically created a temp for the first load,
> >> > and
> >> > then reused that temp in the secondary reference.  I however would not
> >> > be
> >> > making such calls - I instead would probably generate the memory
> >> > reference
> >> > twice, and expect llvm to optimize this.  Is my expectation correct?
> >>
> >> It should.
> >>
> >> Reid
> >
> > ________________________________
> >
> > Hotmail: Trusted email with Microsoft’s powerful SPAM protection. Sign up
> > now.

_________________________________________________________________
The New Busy is not the old busy. Search, chat and e-mail from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID27925::T:WLMTAGL:ON:WL:en-US:WM_HMP:032010_3
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20100329/ffa9059d/attachment.html>