Optimizations for MemCpyOpt / GlobalOpt when compiling large static initializers

Bruno Cardoso Lopes bruno.cardoso at gmail.com
Tue Jul 14 14:00:37 PDT 2015


Hi Anthony,

On Tue, Jul 14, 2015 at 4:44 PM, Anthony Pesch <inolen at gmail.com> wrote:
>
> While working on a project I wound up generating a fairly large lookup table (10k entries) of callbacks inside of a static constructor. Clang was taking upwards of ~10 minutes to compile the lookup table with -O3. I generated a smaller test case (http://www.inolen.com/static_initializer_test.ll) and running it with -ftime-report pointed fingers at MemCpyOptimizer and GlobalOpt.
>
> Running memcpyopt through opt took around ~1 minute. The culprit was MemCpyOptimizer insertion sorting the ranges as it discovered them. I changed this up such that ranges are always appended to the list, and once they've all been scanned they're sorted and merged (n log n vs n^2).
>
> Running globalopt took around ~9 minutes. The slowdown came from how GlobalOpt merged stores from static constructors individually into the global initializer in EvaluateStaticConstructor. For each store it discovered and wanted to commit, it would copy the existing global initializer and then merge in the store. I changed this so that stores are now grouped by global, and sorted from most significant to least significant by their GEP indexes (e.g. a store to GEP 0, 0 comes before GEP 0, 0, 1). With this representation, the existing initializer can be copied and all new stores merged into it in a single pass.
>
> In the end, the lookup table that was taking ~10 minutes to compile now compiles in around 5 seconds on my machine. I've ran 'make check' and the test-suite project, which all passed. With that said however, I'm not entirely confident in my logic, especially in the globalopt changes. Please review carefully.

This is very nice :-)

Ok, so it looks like there a two different targets for improvement
here, could you please split that into 2 different patches?
Also, it might help the review if you use phabricator
(http://reviews.llvm.org/) and include full context in the patches.

Did you had a chance to run the LNT testsuite with your patch? I'm
curious on how it might affect compile time performance in other
programs.

Thanks,

-- 
Bruno Cardoso Lopes
http://www.brunocardoso.cc




More information about the llvm-commits mailing list