[LLVMdev] LTO question

Thu Dec 25 23:55:50 PST 2014

Diego, Teresa, David,

Sorry for my delayed reply; I left for vacation right after sending my message about this.

Diego, it wasn't explicit from your message whether LLVM LTO can handle Firefox-scale programs, which you said GCC can handle.  I assumed that's what you meant, but could you confirm that?  I understand that neither can handle the very large Google applications, but that's probably not a near-term concern for a project like the one Charles is embarking on.

I'd be interested to hear more about the LTO design you folks are working on, whenever you're ready to share the details.  I read the GCC design docs on LTO, and I'm curious how similar or different your approach will be.  For example, the 3-phase approach of WHOPR is fairly sophisticated (it actually follows closely some research done at Rice U. and IBM on scalable interprocedural analysis, in the same group where Preston did his Ph.D.).

For now, I would like to introduce you all to Charles, so that he has access to people working on this issue, which will probably continue to be a concern for his project.  I have copied you on my reply to him.

Thanks for the information.

--Vikram S. Adve
Visiting Professor, Computer Science, EPFL
Professor, Department of Computer Science
University of Illinois at Urbana-Champaign
vadve at illinois.edu
http://llvm.org

On Dec 16, 2014, at 3:48 AM, Teresa Johnson <tejohnson at google.com> wrote:

> On Fri, Dec 12, 2014 at 1:59 PM, Diego Novillo <dnovillo at google.com> wrote:
>> On 12/12/14 15:56, Adve, Vikram Sadanand wrote:
>>> 
>>> I've been asked how LTO in LLVM compares to equivalent capabilities
>>> in GCC.  How do the two compare in terms of scalability?  And
>>> robustness for large applications?
>> 
>> 
>> Neither GCC nor LLVM can handle our (Google) large applications. They're
>> just too massive for the kind of linking done by LTO.
>> 
>> When we built GCC's LTO, we were trying to address this by creating a
>> partitioned model, where the analysis phase and the codegen phase are split
>> to allow working on partial callgraphs
>> (http://gcc.gnu.org/wiki/LinkTimeOptimization for details).
>> 
>> This allows to split and parallelize the initial bytecode generation and the
>> final optimization/codegen. However, the analysis is still implemented as a
>> single process. We found that we cannot even load summaries, types and
>> symbols in an efficient way.
>> 
>> It does allow for programs like Firefox to be handled. So, if by "big" you
>> need to handle something of that size, this model can doit.
>> 
>> With LLVM, I can't even load the IR for one of our large programs on a box
>> with 64Gb of RAM.
>> 
>>> Also, are there any ongoing efforts or plans to improve LTO in LLVM
>>> in the near future?
>> 
>> 
>> Yes. We are going to be investing in this area very soon. David and Teresa
>> (CC'd) will have details.
> 
> Still working out the details, but we are investigating a solution
> that is scalable to very large programs. We'll share the design in the
> near future when we have more details worked out so that we can get
> feedback.
> 
> Thanks!
> Teresa
> 
>> 
>> 
>> Diego.
> 
> 
> 
> -- 
> Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413