[LLVMdev] Performance regression in the LiveIntevals phase

Thu Oct 9 12:56:24 PDT 2014

On Oct 9, 2014, at 12:34 PM, Andrew Trick <atrick at apple.com> wrote:

>> 
>> On Oct 9, 2014, at 9:17 AM, Hal Finkel <hfinkel at anl.gov> wrote:
>> 
>> ----- Original Message -----
>>> From: "Vaidas Gasiunas" <vaidas.gasiunas at sap.com>
>>> To: "LLVM Dev (llvmdev at cs.uiuc.edu)" <llvmdev at cs.uiuc.edu>
>>> Sent: Thursday, October 9, 2014 4:37:39 AM
>>> Subject: [LLVMdev] Performance regression in the LiveIntevals phase
>>> 
>>> Some time ago we reported a compile-time performance regression in
>>> the LiveIntervals analysis pass (see
>>> http://llvm.org/bugs/show_bug.cgi?id=18580 ). We detected it at
>>> first after migrating from LLVM 3.1 to 3.3, but the problem persists
>>> also in 3.5. This regression is especially critical when compiling
>>> long functions. In one of our benchmarks compile time goes from 200s
>>> (in 3.1) up to 1500s (in 3.3).
>>> 
>>> 
>>> 
>>> We also managed to reproduce the regression with a C++ example using
>>> clang. Here are the instructions:
>>> 
>>> 
>>> 
>>> Generate the example C++ file with the Perl script attached to the
>>> Bug (5000 is a good size parameter to clearly see the regression):
>>> 
>>> perl create.pl example.cpp 5000
>>> 
>>> Generate LLVM IR code for the C++ file (the regression is in the
>>> backend functionality, so we don't want to measure frontend
>>> compilation)
>>> 
>>> llvm31/clang++ example.cpp -c -emit-llvm -o example.ir-31
>>> 
>>> llvm35/clang++ example.cpp -c -emit-llvm -o example.ir-35
>>> 
>>> 
>>> 
>>> Now you can run the actual performance measurements using llc:
>>> 
>>> llvm31/llc -cppgen=program -o=5000.asm example.ir-31
>>> 
>>> llvm35/llc -cppgen=program -o=5000.asm example.ir-35
>>> 
>>> or you can also compile example.ir-31 with llvm3.5:
>>> 
>>> llvm35/llc -cppgen=program -o=5000.asm example.ir-31
>>> 
>>> 
>>> 
>>> Our analysis had shown that the most of time is taken for generation
>>> of live intervals for physical registers. In the biggest example,
>>> the live interval of one of the physical registers consists of about
>>> 500.000 live ranges. Insertions in the middle of the array of live
>>> ranges cause quadratic algorithmic complexity, which is apparently
>>> the main reason for the slow-down. I changed the implementation of
>>> computeRegUnitInterval() so that it uses a set instead of the array,
>>> and in this way managed to reduce the execution time of the
>>> computeRegUnitInterval from 1200s down to about 1s. The fix is a bit
>>> ugly, however, because I cannot completely switch to the set, since
>>> further analyses are more efficient on the array. For that reason, I
>>> flush the contents of the set into the array at the end of
>>> computeRegUnitInterval. I also had to rewrite various operations on
>>> the live interval so that they use the set structure if it is
>>> available and the array otherwise.
>>> 
>>> If there is an interest, I can port the patch to the dev version of
>>> llvm, but maybe someone has a better idea of how to deal with the
>>> regression.
>> 
>> I think this would be interesting, please do. There are a few places in LLVM where we currently keep dual data structures, and needing another one is not necessarily bad (we have SetVector, for example).
> 
> Yes, it would be good to get a fix for this behavior in LLVM trunk even if it isn’t pretty. I’ve also seen extendToUses showing up on profiles. Your description of the fix sounds reasonable.

+1. Please send the patch for review.

Thanks,
-Quentin

> 
> -Andy
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141009/463a2611/attachment.html>