[LLVMdev] whole program optimization examples?

Mon Oct 13 18:49:38 PDT 2014

On 10/13/2014 06:17 PM, Filip Pizlo wrote:
>
>
> On Oct 13, 2014, at 4:07 PM, Philip Reames <listmail at philipreames.com 
> <mailto:listmail at philipreames.com>> wrote:
>
>>
>> On 10/13/2014 03:23 PM, Kevin Modzelewski wrote:
>>> With the patchpoint infrastructure, shouldn't it now be relatively 
>>> straightforward to do an accurate-but-non-relocatable scan of the 
>>> stack, by attaching all the GC roots as stackmap arguments to 
>>> patchpoints?  This is something we're currently working on for 
>>> Pyston (ie we don't have it working yet), but I think we might get 
>>> it "for free" once we finish the work on frame introspection.
>> Take a look at the statepoint intrinsics up for review.  These are 
>> essentially exactly that, with two extensions:
>> - A semantic distinction between gc roots and deopt state (since you 
>> may want both)
>> - Support for explicit relocation of the gc root values (this could 
>> be made optional, but is currently not)
>>
>> Though, you really don't want to emit these in your frontend. You 
>> can, it'll work, but the performance will suffer.  Doing so will 
>> prevent many useful optimizations from running.
>
> You really should be specific here. The optimizations you're thinking 
> of may be uninteresting to many clients.
Assuming you have a VM which needs safepoints to occur at some fixed 
interval, you need to put a safepoint poll in *all* loops.  (Well, 
unless you can prove either a) the loop is bounded or b) there's another 
safepoint in the loop.)  Doing so, you introduce a call into the loop 
(using either approach).  This breaks loop recognition, complicates 
alias analysis and thus LICM, and is otherwise bad for the optimizer.

> Also you won't lose any performance if your GC pointers are also 
> needed for deopt (which happens to be the common case).

> I really do think that this whole discussion is tragicomic. Most 
> clients of LLVM would be best served with mostly copying GC.
I believe LLVM should not take a position in this debate and should try 
to support all collectors.
>
> -Filip
>
>
>> Instead, you probably want to consider something like the late 
>> safepoint placement approach we've been pushing. Hopefully, once the 
>> statepoint stuff lands, we can get that upstreamed fairly soon.
>>
>> Philip
>>
>>>
>>> On Sat, Oct 11, 2014 at 11:37 PM, Filip Pizlo <fpizlo at apple.com 
>>> <mailto:fpizlo at apple.com>> wrote:
>>>
>>>
>>>
>>>     > On Oct 10, 2014, at 6:24 PM, Hayden Livingston
>>>     <halivingston at gmail.com <mailto:halivingston at gmail.com>> wrote:
>>>     >
>>>     > Hello,
>>>     >
>>>     > I was wondering if there is an example list somewhere of whole
>>>     program optimizations done by LLVM based compilers?
>>>     >
>>>     > I'm only familiar with method-level optimizations, and I'm
>>>     being told wpo can deliver many great speedups.
>>>     >
>>>     > My language is currently staticly typed JIT based and uses the
>>>     JVM, and I want to move it over to LLVM so that I can have
>>>     options where it can be ahead of time compiled as well.
>>>
>>>     As Philip kindly pointed out, WebKit uses llvm as part of a
>>>     JavaScript JIT optimization pipeline. It works well for WebKit,
>>>     but this was a large amount of work. It may not be the path of
>>>     least resistance depending on what your requirements are.
>>>
>>>     >
>>>     > I'm hearing bad things about LLVM's JIT capabilities --
>>>     specifically that writing your own GC is going to be a pain.
>>>
>>>     This is a fun topic and you'll probably get some good advice. :-)
>>>
>>>     Here's my take. GC in llvm is only a pain if you make the tragic
>>>     mistake of writing an accurate-on-the-stack GC. Accurate
>>>     collectors are only known to be beneficial in niche
>>>     environments, usually if you have an aversion to probabilistic
>>>     algorithms. You might also be stuck requiring accuracy if your
>>>     system relies on being able to force *every* object to
>>>     *immediately* move to a new location, but this is an uncommon
>>>     requirement - usually it happens due to certain speculative
>>>     optimization strategies in dynamic languages.
>>>
>>>     My approach is to use a Bartlett-style mostly-copying collector.
>>>     If you use a Bartlett-style collector then you don't need any
>>>     special support in llvm. It just works, it allows llvm to
>>>     register-allocate pointers at will, and it lends itself
>>>     naturally to high-throughput collector algorithms.
>>>     Bartlett-style collectors come in many shapes and sizes -
>>>     copying or not, mark-region or not, generational or not, and
>>>     even a fancy concurrent copying example exists.
>>>
>>>     WebKit used a Bartlett-style parallel generational sticky-mark
>>>     copying collector with opportunistic mark-region optimizations.
>>>     We haven't written up anything about it yet but it is all open
>>>     source.
>>>
>>>     Hosking's paper about the concurrent variant is here:
>>>     http://dl.acm.org/citation.cfm?doid=1133956.1133963
>>>
>>>     I highly recommend reading Bartlett's original paper about
>>>     conservative copying; it provides an excellent semi space
>>>     algorithm that would be a respectable starting point for any VM.
>>>     You won't regret implementing it - it'll simplify your interface
>>>     to any JIT, not just llvm. It'll also make FFI easy because it
>>>     allows the C stack to refer directly to GC objects without any
>>>     shenanigans.
>>>
>>>     Bartlett is probabilistic in the sense that it may, with low
>>>     probability, increase object drag. This happens rarely. On
>>>     64-bit systems it's especially rare. It's been pretty well
>>>     demonstrated that Bartlett collectors are as fast as accurate
>>>     ones, insofar as anything in GC land can be demonstrated (as in
>>>     it's still a topic of lively debate, though I had some papers
>>>     back in the day that showed some comparisons). WebKit often wins
>>>     GC benchmarks for example, and we particularly like that our GC
>>>     never imposes limitations on llvm optimizations. It's really
>>>     great to be able to view the compiler and the collector as
>>>     orthogonal components!
>>>
>>>     >
>>>     > Anyways, sort of diverged there, but still looking for WPO
>>>     examples!
>>>     >
>>>     > Hayden.
>>>     > _______________________________________________
>>>     > LLVM Developers mailing list
>>>     > LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>>>     http://llvm.cs.uiuc.edu
>>>     > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>     _______________________________________________
>>>     LLVM Developers mailing list
>>>     LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>>>     http://llvm.cs.uiuc.edu
>>>     http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu          http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141013/95f78319/attachment.html>