[LLVMdev] IR Passes and TargetTransformInfo: Straw Man
Shuxin Yang
shuxin.llvm at gmail.com
Sat Jul 27 17:47:35 PDT 2013
Hi, Sean:
I'm sorry I lie. I didn't mean to lie. I did try to avoid making a
*BIG* change
to the IPO pass-ordering for now. However, when I make a minor change to
populateLTOPassManager() by separating module-pass and non-module-passes, I
saw quite a few performance difference, most of them are degradations.
Attacking
these degradations one by one in a piecemeal manner is wasting time. We
might as
well define the pass-ordering for Pre-IPO, IPO and Post-IPO phases at
this time,
and hopefully once for all.
In order to repair the image of being a liar, I post some preliminary
result in this cozy
Saturday afternoon which I normally denote to daydreaming :-)
So far I only measure the result of MultiSource benchmarks on my iMac
(late
2012 model), and the command to run the benchmark is
"make TEST=simple report OPTFLAGS='-O3 -flto'".
In terms of execution-time, some degrade, but more improve, few of them
are quite substantial. User-time is used for comparison. I measure the
result twice, they are basically very stable. As far as I can tell from
the result,
the proposed pass-ordering is basically toward good change.
Interesting enough, if I combine the populatePreIPOPassMgr() as the
preIPO phase
(see the patch) with original populateLTOPassManager() for both IPO and
postIPO,
I see significant improve to
"Benchmarks/Trimaran/netbench-crc/netbench-crc"
(about 94%, 0.5665s(was) vs 0.0295s), as of I write this mail, I have
not yet got chance
to figure out why this combination improves this benchmark this much.
In teams of compile-time, the result reports my change improve the compile
time by about 2x, which is non-sense. I guess test-script doesn't count
link-time.
The new pass ordering Pre-IPO, IPO, and PostIPO are defined by
populate{PreIPO|IPO|PostIPO}PassMgr().
I will discuss with Andy next Monday in order to be consistent with the
pass-ordering design he is envisioning, and measure more benchmarks then
post the patch and result to the community for discussion and approval.
Thanks
Shuxin
On 7/17/13 7:09 PM, Shuxin Yang wrote:
> Andy and I briefly discussed this the other day, we have not yet got
> chance to list a detailed pass order
> for the pre- and post- IPO scalar optimizations.
>
> This is wish-list in our mind:
>
> pre-IPO: based on the ordering he propose, get rid of the inlining
> (or just inline tiny func), get rid of
> all loop xforms...
>
> post-IPO: get rid of inlining, or maybe we still need it, only perform
> the inling to to callee which now become tiny.
> enable the loop xforms.
>
> The SCC pass manager seems to be important inling, no
> matter how the inling looks like in the future,
> I think the passmanager is still useful for scalar
> opt. It enable us to achieve cheap inter-procedural
> opt hands down in the sense that we can optimize
> callee, analyze it, and feedback the detailed whatever
> info back to caller (say info like "the callee
> already return constant 5", the "callee return value in 5-10",
> and such info is difficult to obtain and IPO stage, as
> it can not afford to take such closer look.
>
> I think it is too early to discuss the pre-IPO and post-IPO thing, let
> us focus on what Andy is proposing.
>
>
> On 7/17/13 6:04 PM, Sean Silva wrote:
>> There seems to be a lot of interest recently in LTO. How do you see
>> the situation of splitting the IR passes between per-TU processing
>> and multi-TU ("link time") processing?
>>
>> -- Sean Silva
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130727/ff57e511/attachment.html>
-------------- next part --------------
name exec_was exec_is exec_diff
------------------------------------------- ---------- ---------- ----------------
Benchmarks/TSVC/Symbolics-flt/Symbolics-flt 1.4634 0.684 -53.259532595326
Benchmarks/MiBench/security-sha/security-sh 0.0199 0.0128 -35.678391959799
Benchmarks/mediabench/adpcm/rawcaudio/rawca 0.0034 0.0025 -26.470588235294
Benchmarks/Prolangs-C/agrep/agrep 0.0032 0.0025 -21.875
Benchmarks/mediabench/jpeg/jpeg-6a/cjpeg 0.0032 0.0025 -21.875
Benchmarks/Olden/perimeter/perimeter 0.1747 0.1422 -18.603319977103
Benchmarks/mediabench/adpcm/rawdaudio/rawda 0.0022 0.0018 -18.181818181818
Benchmarks/FreeBench/fourinarow/fourinarow 0.2457 0.2018 -17.867317867317
Benchmarks/Prolangs-C++/family/family 0.0006 0.0005 -16.666666666666
Applications/ALAC/encode/alacconvert-encode 0.0314 0.0264 -15.923566878980
Benchmarks/MiBench/security-rijndael/securi 0.0243 0.0207 -14.814814814814
Benchmarks/mediabench/gsm/toast/toast 0.0174 0.0149 -14.367816091954
Benchmarks/Prolangs-C++/shapes/shapes 0.0007 0.0006 -14.285714285714
Benchmarks/Prolangs-C/bison/mybison 0.0021 0.0018 -14.285714285714
Benchmarks/TSVC/Symbolics-dbl/Symbolics-dbl 2.1248 1.8634 -12.302334337349
Benchmarks/McCat/03-testtrie/testtrie 0.0092 0.0081 -11.956521739130
Applications/treecc/treecc 0.0009 0.0008 -11.111111111111
Benchmarks/Prolangs-C/cdecl/cdecl 0.0009 0.0008 -11.111111111111
Benchmarks/TSVC/NodeSplitting-flt/NodeSplit 2.3019 2.0529 -10.817151049133
Benchmarks/MiBench/network-patricia/network 0.0647 0.0581 -10.200927357032
Benchmarks/McCat/09-vor/vor 0.0816 0.0735 -9.9264705882353
Benchmarks/MallocBench/gs/gs 0.029 0.0262 -9.6551724137931
Benchmarks/MiBench/telecomm-CRC32/telecomm- 0.1227 0.1122 -8.5574572127139
Benchmarks/TSVC/ControlLoops-flt/ControlLoo 1.5978 1.4648 -8.3239454249593
Applications/hexxagon/hexxagon 4.9682 4.566 -8.0954872992230
Benchmarks/Prolangs-C++/simul/simul 0.0043 0.004 -6.9767441860465
Benchmarks/TSVC/Reductions-dbl/Reductions-d 2.3107 2.1611 -6.4742285887393
Benchmarks/TSVC/LinearDependence-dbl/Linear 2.5083 2.3536 -6.1675238209145
Benchmarks/TSVC/LinearDependence-flt/Linear 2.0396 1.9215 -5.7903510492253
Benchmarks/TSVC/ControlLoops-dbl/ControlLoo 2.1258 2.0077 -5.5555555555555
Benchmarks/MiBench/consumer-lame/consumer-l 0.1355 0.1285 -5.1660516605166
Benchmarks/Trimaran/enc-rc4/enc-rc4 0.6262 0.5967 -4.7109549664643
Applications/oggenc/oggenc 0.077 0.0735 -4.5454545454545
Benchmarks/BitBench/uuencode/uuencode 0.0119 0.0114 -4.2016806722689
Benchmarks/Prolangs-C/unix-smail/unix-smail 0.0024 0.0023 -4.1666666666666
Benchmarks/TSVC/InductionVariable-dbl/Induc 2.9528 2.8362 -3.9487943646708
Benchmarks/TSVC/NodeSplitting-dbl/NodeSplit 2.7203 2.6209 -3.6540087490350
Applications/d/make_dparser 0.0174 0.0168 -3.4482758620689
Applications/lambda-0.1.3/lambda 2.6777 2.5864 -3.4096426037270
Applications/viterbi/viterbi 1.8383 1.777 -3.3346026219877
Benchmarks/MiBench/telecomm-gsm/telecomm-gs 0.1172 0.1134 -3.2423208191126
Benchmarks/McCat/18-imp/imp 0.0415 0.0402 -3.1325301204819
Benchmarks/MiBench/automotive-bitcount/auto 0.0518 0.0502 -3.0888030888030
Benchmarks/FreeBench/analyzer/analyzer 0.0333 0.0323 -3.0030030030030
Benchmarks/Prolangs-C++/city/city 0.0036 0.0035 -2.7777777777777
Benchmarks/TSVC/Reductions-flt/Reductions-f 4.4121 4.2942 -2.6721969130346
Benchmarks/Olden/tsp/tsp 0.5126 0.5011 -2.2434646898166
Benchmarks/Trimaran/enc-pc1/enc-pc1 0.1574 0.154 -2.1601016518424
Benchmarks/TSVC/ControlFlow-flt/ControlFlow 2.351 2.3012 -2.1182475542322
Benchmarks/MiBench/network-dijkstra/network 0.0296 0.029 -2.0270270270270
Benchmarks/Ptrdist/bc/bc 0.4764 0.4674 -1.8891687657430
Benchmarks/Prolangs-C/gnugo/gnugo 0.028 0.0275 -1.7857142857142
Benchmarks/VersaBench/dbms/dbms 0.8088 0.7949 -1.7185954500494
Benchmarks/ASC_Sequoia/CrystalMk/CrystalMk 3.7015 3.6379 -1.7182223422936
Benchmarks/Olden/health/health 0.1787 0.1757 -1.6787912702854
Benchmarks/VersaBench/bmm/bmm 1.4694 1.4455 -1.6265142234925
Benchmarks/McCat/01-qbsort/qbsort 0.0876 0.0862 -1.5981735159817
Applications/ClamAV/clamscan 0.094 0.0925 -1.5957446808510
Benchmarks/McCat/17-bintr/bintr 0.0666 0.0658 -1.2012012012012
Benchmarks/MiBench/automotive-susan/automot 0.0312 0.0309 -0.9615384615384
Benchmarks/TSVC/LoopRerolling-dbl/LoopRerol 2.7783 2.7524 -0.9322247417485
Benchmarks/SciMark2-C/scimark2 22.2684 22.0824 -0.8352643207414
Benchmarks/mediabench/g721/g721encode/encod 0.0403 0.04 -0.7444168734491
Benchmarks/ASC_Sequoia/AMGmk/AMGmk 5.0381 5.0033 -0.6907365872054
Benchmarks/TSVC/GlobalDataFlow-dbl/GlobalDa 2.3246 2.3089 -0.6753850124752
Applications/sgefa/sgefa 0.0962 0.0956 -0.6237006237006
Applications/minisat/minisat 4.021 4.0023 -0.4650584431733
Benchmarks/llubenchmark/llu 2.8277 2.8147 -0.4597375959260
Benchmarks/TSVC/Expansion-flt/Expansion-flt 1.8036 1.7961 -0.4158349966733
Applications/aha/aha 1.1345 1.1299 -0.4054649625385
Benchmarks/TSVC/Expansion-dbl/Expansion-dbl 2.5986 2.5886 -0.3848225967828
Benchmarks/PAQ8p/paq8p 33.6364 33.5149 -0.3612158257126
Benchmarks/FreeBench/neural/neural 0.1771 0.1765 -0.3387916431394
Benchmarks/Ptrdist/ft/ft 0.6569 0.6549 -0.3044603440401
Benchmarks/Trimaran/enc-3des/enc-3des 1.3386 1.3354 -0.2390557298670
Benchmarks/VersaBench/ecbdes/ecbdes 1.5638 1.5623 -0.0959201943982
Benchmarks/TSVC/Recurrences-dbl/Recurrences 2.8128 2.8102 -0.0924345847554
Benchmarks/Trimaran/netbench-crc/netbench-c 0.5665 0.566 -0.0882612533098
Benchmarks/Prolangs-C++/life/life 1.826 1.8244 -0.0876232201533
Benchmarks/TSVC/ControlFlow-dbl/ControlFlow 2.6993 2.6973 -0.0740932834438
Benchmarks/TSVC/Packing-flt/Packing-flt 2.6722 2.6716 -0.0224534091759
Benchmarks/TSVC/Searching-flt/Searching-flt 3.3246 3.324 -0.0180472838837
Benchmarks/TSVC/Searching-dbl/Searching-dbl 3.3563 3.3558 -0.0148973572088
Benchmarks/TSVC/Equivalencing-flt/Equivalen 0.9735 0.9734 -0.0102722136620
Applications/Burg/burg 0.0008 0.0008 0.0
Applications/hbd/hbd 0.0018 0.0018 0.0
Benchmarks/BitBench/uudecode/uudecode 0.0243 0.0243 0.0
Benchmarks/McCat/04-bisect/bisect 0.0696 0.0696 0.0
Benchmarks/McCat/05-eks/eks 0.0021 0.0021 0.0
Benchmarks/McCat/15-trie/trie 0.0008 0.0008 0.0
Benchmarks/MiBench/consumer-jpeg/consumer-j 0.0028 0.0028 0.0
Benchmarks/MiBench/office-ispell/office-isp 0.0006 0.0006 0.0
Benchmarks/MiBench/security-blowfish/securi 0.0007 0.0007 0.0
Benchmarks/MiBench/telecomm-adpcm/telecomm- 0.0006 0.0006 0.0
Benchmarks/Prolangs-C++/NP/np 0.0006 0.0006 0.0
Benchmarks/Prolangs-C++/deriv1/deriv1 0.0006 0.0006 0.0
Benchmarks/Prolangs-C++/deriv2/deriv2 0.0006 0.0006 0.0
Benchmarks/Prolangs-C++/employ/employ 0.0038 0.0038 0.0
Benchmarks/Prolangs-C++/fsm/fsm 0.0005 0.0005 0.0
Benchmarks/Prolangs-C++/garage/garage 0.0006 0.0006 0.0
Benchmarks/Prolangs-C++/ocean/ocean 0.042 0.042 0.0
Benchmarks/Prolangs-C++/office/office 0.0006 0.0006 0.0
Benchmarks/Prolangs-C++/trees/trees 0.0006 0.0006 0.0
Benchmarks/Prolangs-C++/vcirc/vcirc 0.0005 0.0005 0.0
Benchmarks/Prolangs-C/allroots/allroots 0.0006 0.0006 0.0
Benchmarks/Prolangs-C/compiler/compiler 0.0006 0.0006 0.0
Benchmarks/Prolangs-C/fixoutput/fixoutput 0.0006 0.0006 0.0
Benchmarks/Prolangs-C/football/football 0.0005 0.0005 0.0
Benchmarks/Prolangs-C/loader/loader 0.0006 0.0006 0.0
Benchmarks/Prolangs-C/simulator/simulator 0.0006 0.0006 0.0
Benchmarks/Prolangs-C/unix-tbl/unix-tbl 0.0006 0.0006 0.0
Benchmarks/TSVC/Recurrences-flt/Recurrences 2.7172 2.7173 0.00368025909023
Benchmarks/TSVC/StatementReordering-dbl/Sta 2.5547 2.555 0.01174306180765
Benchmarks/Trimaran/enc-md5/enc-md5 1.2119 1.2126 0.05776054129878
Benchmarks/MiBench/automotive-basicmath/aut 0.1698 0.1699 0.05889281507655
Benchmarks/ASC_Sequoia/IRSmk/IRSmk 2.6607 2.6626 0.07140977938136
Benchmarks/Fhourstones-3.1/fhourstones3.1 0.7427 0.7433 0.08078632018310
Benchmarks/TSVC/LoopRestructuring-dbl/LoopR 2.9857 2.9883 0.08708175637204
Benchmarks/Olden/em3d/em3d 2.0241 2.0262 0.10374981473247
Benchmarks/TSVC/LoopRerolling-flt/LoopRerol 2.0889 2.0914 0.11968021446694
Benchmarks/TSVC/Packing-dbl/Packing-dbl 2.8154 2.8196 0.14917951268025
Benchmarks/BitBench/five11/five11 4.038 4.0448 0.16840019811788
Benchmarks/Olden/treeadd/treeadd 0.1588 0.1591 0.18891687657430
Benchmarks/TSVC/IndirectAddressing-flt/Indi 2.1573 2.1615 0.19468780419969
Benchmarks/Ptrdist/anagram/anagram 0.6629 0.6644 0.22627847337455
Benchmarks/TSVC/StatementReordering-flt/Sta 1.8867 1.892 0.28091376477446
Benchmarks/TSVC/IndirectAddressing-dbl/Indi 2.6113 2.6189 0.29104277562899
Benchmarks/FreeBench/pifft/pifft 0.0636 0.0638 0.31446540880501
Benchmarks/Prolangs-C++/primes/primes 0.1916 0.1923 0.36534446764092
Benchmarks/TSVC/GlobalDataFlow-flt/GlobalDa 1.2514 1.2567 0.42352565127056
Benchmarks/Olden/power/power 0.7097 0.7129 0.45089474425813
Benchmarks/ASCI_Purple/SMG2000/smg2000 1.4904 1.4972 0.45625335480408
Applications/lemon/lemon 0.6774 0.6805 0.45763212282255
Benchmarks/MiBench/telecomm-FFT/telecomm-ff 0.0209 0.021 0.47846889952154
Benchmarks/7zip/7zip-benchmark 5.9521 5.9811 0.48722299692545
Benchmarks/TSVC/CrossingThresholds-dbl/Cros 2.6449 2.6578 0.48773110514575
Applications/SPASS/SPASS 5.9442 5.9748 0.51478752397294
Benchmarks/MallocBench/cfrac/cfrac 1.2635 1.2704 0.54610209734862
Benchmarks/Ptrdist/ks/ks 0.7054 0.7117 0.89311029203288
Benchmarks/MallocBench/espresso/espresso 0.3836 0.3871 0.91240875912408
Applications/JM/lencod/lencod 3.7442 3.7859 1.11372255755568
Benchmarks/TSVC/Equivalencing-dbl/Equivalen 1.3717 1.3881 1.19559670481884
Benchmarks/Olden/bh/bh 0.6255 0.633 1.1990407673861
Benchmarks/VersaBench/8b10b/8b10b 2.8968 2.9416 1.5465341066004
Benchmarks/BitBench/drop3/drop3 0.174 0.1768 1.60919540229886
Benchmarks/McCat/12-IOtest/iotest 0.1223 0.1243 1.63532297628781
Applications/spiff/spiff 1.629 1.6558 1.6451810926949
Benchmarks/TSVC/CrossingThresholds-flt/Cros 2.0682 2.1028 1.67295232569383
Benchmarks/Olden/voronoi/voronoi 0.1569 0.1596 1.72084130019119
Applications/lua/lua 14.0101 14.2671 1.83439090370518
Benchmarks/nbench/nbench 5.4638 5.568 1.90709762436399
Applications/sqlite3/sqlite3 2.3871 2.4339 1.960537891165
Applications/ALAC/decode/alacconvert-decode 0.0152 0.0155 1.97368421052632
Benchmarks/Trimaran/netbench-url/netbench-u 2.7548 2.8112 2.04733555975025
Benchmarks/Olden/bisort/bisort 0.3265 0.3332 2.05206738131699
Benchmarks/Fhourstones/fhourstones 0.6284 0.6419 2.14831317632083
Applications/JM/ldecod/ldecod 0.0543 0.0556 2.39410681399631
Benchmarks/TSVC/LoopRestructuring-flt/LoopR 2.2302 2.2848 2.4482109227872
Benchmarks/FreeBench/mason/mason 0.1085 0.1113 2.58064516129032
Benchmarks/Bullet/bullet 3.0174 3.0968 2.63140452044807
Applications/SIBsim4/SIBsim4 1.8364 1.8853 2.66281855804835
Benchmarks/McCat/08-main/main 0.0138 0.0142 2.89855072463769
Applications/siod/siod 1.8991 1.9696 3.71228476646833
Benchmarks/FreeBench/distray/distray 0.0793 0.0829 4.53972257250947
Benchmarks/NPB-serial/is/is 4.6101 4.8299 4.76779245569511
Applications/kimwitu++/kc 0.0266 0.0279 4.88721804511279
Benchmarks/Olden/mst/mst 0.0551 0.0589 6.89655172413793
Benchmarks/Ptrdist/yacr2/yacr2 0.5277 0.5663 7.31476217547851
Benchmarks/VersaBench/beamformer/beamformer 0.6497 0.7015 7.97291057411112
Benchmarks/sim/sim 2.6061 2.8147 8.00429760945475
Benchmarks/FreeBench/pcompress2/pcompress2 0.101 0.1097 8.61386138613861
Benchmarks/mafft/pairlocalalign 16.7374 18.4048 9.9621207594967
Benchmarks/MiBench/office-stringsearch/offi 0.001 0.0011 10.0
Benchmarks/TSVC/InductionVariable-flt/Induc 2.0788 2.2966 10.4771983836829
Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2d 0.0076 0.0084 10.5263157894737
Benchmarks/MiBench/consumer-typeset/consume 0.0943 0.1053 11.6648992576882
Benchmarks/tramp3d-v4/tramp3d-v4 0.1849 0.208 12.493239588967
Benchmarks/Prolangs-C++/objects/objects 0.0005 0.0006 20.0
Benchmarks/Prolangs-C/TimberWolfMC/timberwo 0.0005 0.0006 20.0
Benchmarks/Prolangs-C/assembler/assembler 0.0005 0.0006 20.0
-------------- next part --------------
Index: include/llvm/Transforms/IPO/PassManagerBuilder.h
===================================================================
--- include/llvm/Transforms/IPO/PassManagerBuilder.h (revision 187135)
+++ include/llvm/Transforms/IPO/PassManagerBuilder.h (working copy)
@@ -132,8 +132,14 @@
/// populateModulePassManager - This sets up the primary pass manager.
void populateModulePassManager(PassManagerBase &MPM);
- void populateLTOPassManager(PassManagerBase &PM, bool Internalize,
- bool RunInliner, bool DisableGVNLoadPRE = false);
+
+ /// setup passes for Pre-IPO phase
+ void populatePreIPOPassMgr(PassManagerBase &MPM);
+
+ void populateIPOPassManager(PassManagerBase &PM, bool Internalize,
+ bool RunInliner);
+
+ void populatePostIPOPM(PassManagerBase &PM);
};
/// Registers a function for adding a standard set of passes. This should be
Index: include/llvm/Transforms/IPO.h
===================================================================
--- include/llvm/Transforms/IPO.h (revision 187135)
+++ include/llvm/Transforms/IPO.h (working copy)
@@ -89,6 +89,7 @@
/// threshold given here.
Pass *createFunctionInliningPass();
Pass *createFunctionInliningPass(int Threshold);
+Pass *createTinyFuncInliningPass();
//===----------------------------------------------------------------------===//
/// createAlwaysInlinerPass - Return a new pass object that inlines only
Index: tools/lto/LTOCodeGenerator.cpp
===================================================================
--- tools/lto/LTOCodeGenerator.cpp (revision 187135)
+++ tools/lto/LTOCodeGenerator.cpp (working copy)
@@ -412,11 +412,12 @@
// Enabling internalize here would use its AllButMain variant. It
// keeps only main if it exists and does nothing for libraries. Instead
// we create the pass ourselves with the symbol list provided by the linker.
- if (!DisableOpt)
- PassManagerBuilder().populateLTOPassManager(passes,
- /*Internalize=*/false,
- !DisableInline,
- DisableGVNLoadPRE);
+ if (!DisableOpt) {
+ PassManagerBuilder().populateIPOPassManager(passes,
+ /*Internalize=*/false,
+ !DisableInline);
+ PassManagerBuilder().populatePostIPOPM(passes);
+ }
// Make sure everything is still good.
passes.add(createVerifierPass());
Index: tools/opt/opt.cpp
===================================================================
--- tools/opt/opt.cpp (revision 187135)
+++ tools/opt/opt.cpp (working copy)
@@ -104,6 +104,11 @@
cl::desc("Include the standard compile time optimizations"));
static cl::opt<bool>
+StandardPreIPOOpts("std-preipo-opts",
+ cl::desc("Include the standard pre-IPO optimizations"));
+
+
+static cl::opt<bool>
StandardLinkOpts("std-link-opts",
cl::desc("Include the standard link time optimizations"));
@@ -470,6 +475,23 @@
Builder.populateModulePassManager(PM);
}
+static void AddPreIPOCompilePasses(PassManagerBase &PM) {
+ PM.add(createVerifierPass()); // Verify that input is correct
+
+ // If the -strip-debug command line option was specified, do it.
+ if (StripDebug)
+ addPass(PM, createStripSymbolsPass(true));
+
+ if (DisableOptimizations) return;
+
+ // -std-preipo-opts adds the same module passes as -O3.
+ PassManagerBuilder Builder;
+ if (!DisableInline)
+ Builder.Inliner = createTinyFuncInliningPass();
+ Builder.OptLevel = 3;
+ Builder.populatePreIPOPassMgr(PM);
+}
+
static void AddStandardLinkPasses(PassManagerBase &PM) {
PM.add(createVerifierPass()); // Verify that input is correct
@@ -480,8 +502,9 @@
if (DisableOptimizations) return;
PassManagerBuilder Builder;
- Builder.populateLTOPassManager(PM, /*Internalize=*/ !DisableInternalize,
+ Builder.populateIPOPassManager(PM, /*Internalize=*/ !DisableInternalize,
/*RunInliner=*/ !DisableInline);
+ Builder.populatePostIPOPM(PM);
}
//===----------------------------------------------------------------------===//
@@ -778,6 +801,12 @@
StandardCompileOpts = false;
}
+ // If -std-preipo-opts was specified at the end of the pass list, add them.
+ if (StandardPreIPOOpts) {
+ AddPreIPOCompilePasses(Passes);
+ StandardPreIPOOpts = false;
+ }
+
if (StandardLinkOpts) {
AddStandardLinkPasses(Passes);
StandardLinkOpts = false;
Index: tools/bugpoint/bugpoint.cpp
===================================================================
--- tools/bugpoint/bugpoint.cpp (revision 187135)
+++ tools/bugpoint/bugpoint.cpp (working copy)
@@ -169,8 +169,9 @@
if (StandardLinkOpts) {
PassManagerBuilder Builder;
- Builder.populateLTOPassManager(PM, /*Internalize=*/true,
+ Builder.populateIPOPassManager(PM, /*Internalize=*/true,
/*RunInliner=*/true);
+ Builder.populatePostIPOPM(PM);
}
if (OptLevelO1 || OptLevelO2 || OptLevelO3) {
Index: lib/Transforms/IPO/PassManagerBuilder.cpp
===================================================================
--- lib/Transforms/IPO/PassManagerBuilder.cpp (revision 187135)
+++ lib/Transforms/IPO/PassManagerBuilder.cpp (working copy)
@@ -294,10 +294,78 @@
addExtensionsToPM(EP_OptimizerLast, MPM);
}
-void PassManagerBuilder::populateLTOPassManager(PassManagerBase &PM,
+void PassManagerBuilder::populatePreIPOPassMgr(PassManagerBase &MPM) {
+ // If all optimizations are disabled, just run the always-inline pass.
+ if (OptLevel == 0) {
+ if (Inliner) {
+ MPM.add(Inliner);
+ Inliner = 0;
+ }
+ return;
+ }
+
+ bool EnableLightWeightIPO = (OptLevel > 1);
+
+ // Add LibraryInfo if we have some.
+ if (LibraryInfo) MPM.add(new TargetLibraryInfo(*LibraryInfo));
+ addInitialAliasAnalysisPasses(MPM);
+
+ // Start of CallGraph SCC passes.
+ {
+ if (EnableLightWeightIPO) {
+ MPM.add(createPruneEHPass()); // Remove dead EH info
+ if (Inliner) {
+ MPM.add(Inliner);
+ Inliner = 0;
+ }
+ MPM.add(createArgumentPromotionPass()); // Scalarize uninlined fn args
+ }
+
+ // Start of function pass.
+ {
+ if (UseNewSROA)
+ MPM.add(createSROAPass(/*RequiresDomTree*/ false));
+ else
+ MPM.add(createScalarReplAggregatesPass(-1, false));
+
+ MPM.add(createEarlyCSEPass()); // Catch trivial redundancies
+ MPM.add(createJumpThreadingPass()); // Thread jumps.
+ MPM.add(createCorrelatedValuePropagationPass());// Propagate conditionals
+ MPM.add(createCFGSimplificationPass()); // Merge & remove BBs
+ MPM.add(createInstructionCombiningPass()); // Combine silly seq's
+ MPM.add(createReassociatePass()); // Reassociate expressions
+ MPM.add(createLoopRotatePass()); // Rotate Loop
+ MPM.add(createLICMPass()); // Hoist loop invariants
+ MPM.add(createIndVarSimplifyPass()); // Canonicalize indvars
+ MPM.add(createLoopIdiomPass()); // Recognize idioms like memset.
+ MPM.add(createLoopDeletionPass()); // Delete dead loops
+
+ MPM.add(createGVNPass()); // Remove redundancies
+ MPM.add(createMemCpyOptPass()); // Remove memcpy / form memset
+ MPM.add(createSCCPPass()); // Constant prop with SCCP
+
+ MPM.add(createDeadStoreEliminationPass()); // Delete dead stores
+ MPM.add(createAggressiveDCEPass()); // Delete dead instructions
+ MPM.add(createFunctionAttrsPass()); // Set readonly/readnone attrs
+
+ MPM.add(createTailCallEliminationPass()); // Eliminate tail calls
+ }
+
+ // End of CallGraph SCC passes.
+ }
+
+ if (EnableLightWeightIPO) {
+ MPM.add(createGlobalOptimizerPass()); // Optimize out global vars
+ MPM.add(createIPSCCPPass()); // IP SCCP
+ MPM.add(createDeadArgEliminationPass()); // Dead argument elimination
+ MPM.add(createGlobalDCEPass()); // Remove dead fns and globals.
+ MPM.add(createConstantMergePass()); // Merge dup global constants
+ }
+}
+
+void PassManagerBuilder::populateIPOPassManager(PassManagerBase &PM,
bool Internalize,
- bool RunInliner,
- bool DisableGVNLoadPRE) {
+ bool RunInliner) {
// Provide AliasAnalysis services for optimizations.
addInitialAliasAnalysisPasses(PM);
@@ -325,15 +393,9 @@
// Remove unused arguments from functions.
PM.add(createDeadArgEliminationPass());
- // Reduce the code after globalopt and ipsccp. Both can open up significant
- // simplification opportunities, and both can propagate functions through
- // function pointers. When this happens, we often have to resolve varargs
- // calls, etc, so let instcombine do this.
- PM.add(createInstructionCombiningPass());
-
// Inline small functions
if (RunInliner)
- PM.add(createFunctionInliningPass());
+ PM.add(createFunctionInliningPass(255));
PM.add(createPruneEHPass()); // Remove dead EH info.
@@ -346,35 +408,98 @@
// transform it to pass arguments by value instead of by reference.
PM.add(createArgumentPromotionPass());
- // The IPO passes may leave cruft around. Clean up after them.
- PM.add(createInstructionCombiningPass());
- PM.add(createJumpThreadingPass());
- // Break up allocas
- if (UseNewSROA)
- PM.add(createSROAPass());
- else
- PM.add(createScalarReplAggregatesPass());
-
// Run a few AA driven optimizations here and now, to cleanup the code.
PM.add(createFunctionAttrsPass()); // Add nocapture.
PM.add(createGlobalsModRefPass()); // IP alias analysis.
+}
- PM.add(createLICMPass()); // Hoist loop invariants.
- PM.add(createGVNPass(DisableGVNLoadPRE)); // Remove redundancies.
- PM.add(createMemCpyOptPass()); // Remove dead memcpys.
- // Nuke dead stores.
- PM.add(createDeadStoreEliminationPass());
+void PassManagerBuilder::populatePostIPOPM(PassManagerBase &PM) {
+ // In PostIPO phase, the choice for inlining is simple: either no inlining at
+ // all or just run the inliner which only inline tiny functions. This function
+ // has freedom to pick up which choice is more appropriate.
+ //
+ assert(Inliner == 0 && "Don't specify inliner");
+ if (OptLevel == 0)
+ return;
- // Cleanup and simplify the code after the scalar optimizations.
- PM.add(createInstructionCombiningPass());
+ bool EnableLightWeightIPO = (OptLevel > 1);
- PM.add(createJumpThreadingPass());
+ // Add LibraryInfo if we have some.
+ if (LibraryInfo) PM.add(new TargetLibraryInfo(*LibraryInfo));
- // Delete basic blocks, which optimization passes may have killed.
- PM.add(createCFGSimplificationPass());
+ addInitialAliasAnalysisPasses(PM);
- // Now that we have optimized the program, discard unreachable functions.
- PM.add(createGlobalDCEPass());
+ // Start of CallGraph SCC passes.
+ {
+ if (EnableLightWeightIPO) {
+ PM.add(createTinyFuncInliningPass());
+ PM.add(createFunctionAttrsPass()); // Set readonly/readnone attrs
+ }
+
+ // Start of function pass.
+ {
+ PM.add(createMemCpyOptPass()); // Remove memcpy / form memset
+ if (UseNewSROA)
+ PM.add(createSROAPass(/*RequiresDomTree*/ false));
+ else
+ PM.add(createScalarReplAggregatesPass(-1, false));
+ PM.add(createEarlyCSEPass()); // Catch trivial redundancies
+ PM.add(createSCCPPass()); // Constant prop with SCCP
+ PM.add(createJumpThreadingPass()); // Thread jumps
+ PM.add(createCorrelatedValuePropagationPass()); // Propagate conditionals
+ PM.add(createCFGSimplificationPass()); // Merge & remove BBs
+ PM.add(createReassociatePass()); // Reassociate expressions
+ PM.add(createLoopRotatePass()); // Rotate Loop
+ PM.add(createLICMPass()); // Hoist loop invariants
+ PM.add(createLoopUnswitchPass(SizeLevel || OptLevel < 3));
+ PM.add(createIndVarSimplifyPass()); // Canonicalize indvars
+ PM.add(createLoopIdiomPass()); // Recognize idioms like memset.
+ PM.add(createLoopDeletionPass()); // Delete dead loops
+
+ if (/*LoopVectorize &&*/ OptLevel > 1 && SizeLevel < 2)
+ PM.add(createLoopVectorizePass());
+
+ if (!DisableUnrollLoops)
+ PM.add(createLoopUnrollPass()); // Unroll small loops
+
+ addExtensionsToPM(EP_LoopOptimizerEnd, PM);
+
+ if (OptLevel > 1)
+ PM.add(createGVNPass()); // Remove redundancies
+
+ PM.add(createInstructionCombiningPass());
+ PM.add(createDeadStoreEliminationPass()); // Delete dead stores
+ PM.add(createAggressiveDCEPass()); // Delete dead instructions
+ if (UseNewSROA)
+ PM.add(createSROAPass(/*RequiresDomTree*/ false));
+ else
+ PM.add(createScalarReplAggregatesPass(-1, false));
+
+ addExtensionsToPM(EP_ScalarOptimizerLate, PM);
+
+
+ // Add the various vectorization passes and relevant cleanup passes for
+ // them since we are no longer in the middle of the main scalar pipeline.
+ if (/*LoopVectorize && */OptLevel > 1 && SizeLevel < 2)
+ PM.add(createLoopVectorizePass());
+
+ #if 1
+ if (!DisableUnrollLoops)
+ PM.add(createLoopUnrollPass()); // Unroll small loops
+ #endif
+
+ PM.add(createInstructionCombiningPass());
+
+ if (SLPVectorize)
+ PM.add(createSLPVectorizerPass()); // Vectorize parallel scalar chains.
+ }
+ }
+
+ if (EnableLightWeightIPO) {
+ PM.add(createGlobalDCEPass()); // Remove dead fns and globals.
+ PM.add(createConstantMergePass()); // Merge dup global constants
+ }
+ addExtensionsToPM(EP_OptimizerLast, PM);
}
inline PassManagerBuilder *unwrap(LLVMPassManagerBuilderRef P) {
@@ -458,5 +583,6 @@
LLVMBool RunInliner) {
PassManagerBuilder *Builder = unwrap(PMB);
PassManagerBase *LPM = unwrap(PM);
- Builder->populateLTOPassManager(*LPM, Internalize != 0, RunInliner != 0);
+ Builder->populateIPOPassManager(*LPM, Internalize != 0, RunInliner != 0);
+ Builder->populatePostIPOPM(*LPM);
}
Index: lib/Transforms/IPO/InlineSimple.cpp
===================================================================
--- lib/Transforms/IPO/InlineSimple.cpp (revision 187135)
+++ lib/Transforms/IPO/InlineSimple.cpp (working copy)
@@ -72,6 +72,10 @@
return new SimpleInliner(Threshold);
}
+Pass *llvm::createTinyFuncInliningPass() {
+ return new SimpleInliner(40);
+}
+
bool SimpleInliner::runOnSCC(CallGraphSCC &SCC) {
ICA = &getAnalysis<InlineCostAnalysis>();
return Inliner::runOnSCC(SCC);
-------------- next part --------------
Index: include/clang/Frontend/CodeGenOptions.def
===================================================================
--- include/clang/Frontend/CodeGenOptions.def (revision 187135)
+++ include/clang/Frontend/CodeGenOptions.def (working copy)
@@ -112,6 +112,7 @@
CODEGENOPT(VectorizeBB , 1, 0) ///< Run basic block vectorizer.
CODEGENOPT(VectorizeLoop , 1, 0) ///< Run loop vectorizer.
CODEGENOPT(VectorizeSLP , 1, 0) ///< Run SLP vectorizer.
+CODEGENOPT(IsPreIPO , 1, 0) ///< Indicate in pre-IPO phase
/// Attempt to use register sized accesses to bit-fields in structures, when
/// possible.
Index: include/clang/Driver/CC1Options.td
===================================================================
--- include/clang/Driver/CC1Options.td (revision 187135)
+++ include/clang/Driver/CC1Options.td (working copy)
@@ -210,6 +210,8 @@
HelpText<"Run the SLP vectorization passes">;
def vectorize_slp_aggressive : Flag<["-"], "vectorize-slp-aggressive">,
HelpText<"Run the BB vectorization passes">;
+def preipo : Flag<["-"], "preipo">,
+ HelpText<"Run the pre-IPO passes">;
//===----------------------------------------------------------------------===//
// Dependency Output Options
Index: lib/Frontend/CompilerInvocation.cpp
===================================================================
--- lib/Frontend/CompilerInvocation.cpp (revision 187135)
+++ lib/Frontend/CompilerInvocation.cpp (working copy)
@@ -402,6 +402,7 @@
Opts.VectorizeBB = Args.hasArg(OPT_vectorize_slp_aggressive);
Opts.VectorizeLoop = Args.hasArg(OPT_vectorize_loops);
Opts.VectorizeSLP = Args.hasArg(OPT_vectorize_slp);
+ Opts.IsPreIPO = Args.hasArg(OPT_preipo);
Opts.MainFileName = Args.getLastArgValue(OPT_main_file_name);
Opts.VerifyModule = !Args.hasArg(OPT_disable_llvm_verifier);
Index: lib/Driver/Tools.cpp
===================================================================
--- lib/Driver/Tools.cpp (revision 187135)
+++ lib/Driver/Tools.cpp (working copy)
@@ -2014,7 +2014,8 @@
CmdArgs.push_back("-emit-pth");
} else {
assert(isa<CompileJobAction>(JA) && "Invalid action for clang tool.");
-
+ if (D.IsUsingLTO(Args))
+ CmdArgs.push_back("-preipo");
if (JA.getType() == types::TY_Nothing) {
CmdArgs.push_back("-fsyntax-only");
} else if (JA.getType() == types::TY_LLVM_IR ||
Index: lib/CodeGen/BackendUtil.cpp
===================================================================
--- lib/CodeGen/BackendUtil.cpp (revision 187135)
+++ lib/CodeGen/BackendUtil.cpp (working copy)
@@ -274,6 +274,10 @@
switch (Inlining) {
case CodeGenOptions::NoInlining: break;
case CodeGenOptions::NormalInlining: {
+ if (CodeGenOpts.IsPreIPO) {
+ PMBuilder.Inliner = createTinyFuncInliningPass();
+ break;
+ }
// FIXME: Derive these constants in a principled fashion.
unsigned Threshold = 225;
if (CodeGenOpts.OptimizeSize == 1) // -Os
@@ -321,7 +325,10 @@
MPM->add(createStripSymbolsPass(true));
}
- PMBuilder.populateModulePassManager(*MPM);
+ if (!CodeGenOpts.IsPreIPO)
+ PMBuilder.populateModulePassManager(*MPM);
+ else
+ PMBuilder.populatePreIPOPassMgr(*MPM);
}
TargetMachine *EmitAssemblyHelper::CreateTargetMachine(bool MustCreateTM) {
More information about the llvm-dev
mailing list