[LLVMdev] IR Passes and TargetTransformInfo: Straw Man

Shuxin Yang shuxin.llvm at gmail.com
Sat Jul 27 17:47:35 PDT 2013


Hi, Sean:

   I'm sorry I lie.  I didn't mean to lie. I did try to avoid making a 
*BIG* change
to the IPO pass-ordering for now. However, when I make a minor change to
populateLTOPassManager() by separating module-pass and non-module-passes, I
saw quite a few performance difference, most of them are degradations. 
Attacking
these degradations one by one in a piecemeal manner is wasting time. We 
might as
well define the pass-ordering for Pre-IPO, IPO and Post-IPO phases at 
this time,
and hopefully once for all.

  In order to repair the image of being a liar, I post some preliminary 
result in this cozy
Saturday afternoon which I normally denote to daydreaming :-)

  So far I only measure the result of MultiSource benchmarks on my iMac 
(late
2012 model), and the command to run the benchmark is
  "make TEST=simple report OPTFLAGS='-O3 -flto'".

  In terms of execution-time, some degrade, but more improve, few of them
are quite substantial. User-time is used for comparison. I measure the
result twice, they are basically very stable. As far as I can tell from 
the result,
the proposed pass-ordering is basically toward good change.

  Interesting enough, if I combine the populatePreIPOPassMgr() as the 
preIPO phase
(see the patch) with original populateLTOPassManager() for both IPO and 
postIPO,
I see significant improve to 
"Benchmarks/Trimaran/netbench-crc/netbench-crc"
(about 94%, 0.5665s(was) vs 0.0295s), as of I write this mail, I have 
not yet got chance
to figure out why this combination improves this benchmark this much.

  In teams of compile-time, the result reports my change improve the compile
time by about 2x, which is non-sense. I guess test-script doesn't count
link-time.

   The new pass ordering Pre-IPO, IPO, and PostIPO are defined by
populate{PreIPO|IPO|PostIPO}PassMgr().

   I will discuss with Andy next Monday in order to be consistent with the
pass-ordering design he is envisioning, and measure more benchmarks then
post the patch and result to the community for discussion and approval.

Thanks
Shuxin


On 7/17/13 7:09 PM, Shuxin Yang wrote:
> Andy and I briefly discussed this the other day, we have not yet got 
> chance to list a detailed pass order
> for the pre- and post- IPO scalar optimizations.
>
> This is wish-list in our mind:
>
> pre-IPO:  based on the ordering he propose, get rid of the inlining 
> (or just inline tiny func), get rid of
>                all loop xforms...
>
> post-IPO: get rid of inlining, or maybe we still need it, only perform 
> the inling to to callee which now become tiny.
>                enable the loop xforms.
>
>                 The SCC pass manager seems to be important inling,  no 
> matter how the inling looks like in the future,
>                 I think the passmanager is still useful for scalar 
> opt.  It enable us to achieve cheap inter-procedural
>                 opt hands down in the sense that we can optimize 
> callee, analyze it, and feedback the detailed whatever
>                 info  back to caller (say info like "the callee 
> already return constant 5", the "callee return value in 5-10",
>                 and such info is difficult to obtain and IPO stage, as 
> it can not afford to take such closer look.
>
> I think it is too early to discuss the pre-IPO and post-IPO thing, let 
> us focus on what Andy is proposing.
>
>
> On 7/17/13 6:04 PM, Sean Silva wrote:
>> There seems to be a lot of interest recently in LTO. How do you see 
>> the situation of splitting the IR passes between per-TU processing 
>> and multi-TU ("link time") processing?
>>
>> -- Sean Silva
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu          http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130727/ff57e511/attachment.html>
-------------- next part --------------
name                                         exec_was    exec_is     exec_diff       
-------------------------------------------  ----------  ----------  ----------------
Benchmarks/TSVC/Symbolics-flt/Symbolics-flt  1.4634      0.684       -53.259532595326
Benchmarks/MiBench/security-sha/security-sh  0.0199      0.0128      -35.678391959799
Benchmarks/mediabench/adpcm/rawcaudio/rawca  0.0034      0.0025      -26.470588235294
Benchmarks/Prolangs-C/agrep/agrep            0.0032      0.0025      -21.875         
Benchmarks/mediabench/jpeg/jpeg-6a/cjpeg     0.0032      0.0025      -21.875         
Benchmarks/Olden/perimeter/perimeter         0.1747      0.1422      -18.603319977103
Benchmarks/mediabench/adpcm/rawdaudio/rawda  0.0022      0.0018      -18.181818181818
Benchmarks/FreeBench/fourinarow/fourinarow   0.2457      0.2018      -17.867317867317
Benchmarks/Prolangs-C++/family/family        0.0006      0.0005      -16.666666666666
Applications/ALAC/encode/alacconvert-encode  0.0314      0.0264      -15.923566878980
Benchmarks/MiBench/security-rijndael/securi  0.0243      0.0207      -14.814814814814
Benchmarks/mediabench/gsm/toast/toast        0.0174      0.0149      -14.367816091954
Benchmarks/Prolangs-C++/shapes/shapes        0.0007      0.0006      -14.285714285714
Benchmarks/Prolangs-C/bison/mybison          0.0021      0.0018      -14.285714285714
Benchmarks/TSVC/Symbolics-dbl/Symbolics-dbl  2.1248      1.8634      -12.302334337349
Benchmarks/McCat/03-testtrie/testtrie        0.0092      0.0081      -11.956521739130
Applications/treecc/treecc                   0.0009      0.0008      -11.111111111111
Benchmarks/Prolangs-C/cdecl/cdecl            0.0009      0.0008      -11.111111111111
Benchmarks/TSVC/NodeSplitting-flt/NodeSplit  2.3019      2.0529      -10.817151049133
Benchmarks/MiBench/network-patricia/network  0.0647      0.0581      -10.200927357032
Benchmarks/McCat/09-vor/vor                  0.0816      0.0735      -9.9264705882353
Benchmarks/MallocBench/gs/gs                 0.029       0.0262      -9.6551724137931
Benchmarks/MiBench/telecomm-CRC32/telecomm-  0.1227      0.1122      -8.5574572127139
Benchmarks/TSVC/ControlLoops-flt/ControlLoo  1.5978      1.4648      -8.3239454249593
Applications/hexxagon/hexxagon               4.9682      4.566       -8.0954872992230
Benchmarks/Prolangs-C++/simul/simul          0.0043      0.004       -6.9767441860465
Benchmarks/TSVC/Reductions-dbl/Reductions-d  2.3107      2.1611      -6.4742285887393
Benchmarks/TSVC/LinearDependence-dbl/Linear  2.5083      2.3536      -6.1675238209145
Benchmarks/TSVC/LinearDependence-flt/Linear  2.0396      1.9215      -5.7903510492253
Benchmarks/TSVC/ControlLoops-dbl/ControlLoo  2.1258      2.0077      -5.5555555555555
Benchmarks/MiBench/consumer-lame/consumer-l  0.1355      0.1285      -5.1660516605166
Benchmarks/Trimaran/enc-rc4/enc-rc4          0.6262      0.5967      -4.7109549664643
Applications/oggenc/oggenc                   0.077       0.0735      -4.5454545454545
Benchmarks/BitBench/uuencode/uuencode        0.0119      0.0114      -4.2016806722689
Benchmarks/Prolangs-C/unix-smail/unix-smail  0.0024      0.0023      -4.1666666666666
Benchmarks/TSVC/InductionVariable-dbl/Induc  2.9528      2.8362      -3.9487943646708
Benchmarks/TSVC/NodeSplitting-dbl/NodeSplit  2.7203      2.6209      -3.6540087490350
Applications/d/make_dparser                  0.0174      0.0168      -3.4482758620689
Applications/lambda-0.1.3/lambda             2.6777      2.5864      -3.4096426037270
Applications/viterbi/viterbi                 1.8383      1.777       -3.3346026219877
Benchmarks/MiBench/telecomm-gsm/telecomm-gs  0.1172      0.1134      -3.2423208191126
Benchmarks/McCat/18-imp/imp                  0.0415      0.0402      -3.1325301204819
Benchmarks/MiBench/automotive-bitcount/auto  0.0518      0.0502      -3.0888030888030
Benchmarks/FreeBench/analyzer/analyzer       0.0333      0.0323      -3.0030030030030
Benchmarks/Prolangs-C++/city/city            0.0036      0.0035      -2.7777777777777
Benchmarks/TSVC/Reductions-flt/Reductions-f  4.4121      4.2942      -2.6721969130346
Benchmarks/Olden/tsp/tsp                     0.5126      0.5011      -2.2434646898166
Benchmarks/Trimaran/enc-pc1/enc-pc1          0.1574      0.154       -2.1601016518424
Benchmarks/TSVC/ControlFlow-flt/ControlFlow  2.351       2.3012      -2.1182475542322
Benchmarks/MiBench/network-dijkstra/network  0.0296      0.029       -2.0270270270270
Benchmarks/Ptrdist/bc/bc                     0.4764      0.4674      -1.8891687657430
Benchmarks/Prolangs-C/gnugo/gnugo            0.028       0.0275      -1.7857142857142
Benchmarks/VersaBench/dbms/dbms              0.8088      0.7949      -1.7185954500494
Benchmarks/ASC_Sequoia/CrystalMk/CrystalMk   3.7015      3.6379      -1.7182223422936
Benchmarks/Olden/health/health               0.1787      0.1757      -1.6787912702854
Benchmarks/VersaBench/bmm/bmm                1.4694      1.4455      -1.6265142234925
Benchmarks/McCat/01-qbsort/qbsort            0.0876      0.0862      -1.5981735159817
Applications/ClamAV/clamscan                 0.094       0.0925      -1.5957446808510
Benchmarks/McCat/17-bintr/bintr              0.0666      0.0658      -1.2012012012012
Benchmarks/MiBench/automotive-susan/automot  0.0312      0.0309      -0.9615384615384
Benchmarks/TSVC/LoopRerolling-dbl/LoopRerol  2.7783      2.7524      -0.9322247417485
Benchmarks/SciMark2-C/scimark2               22.2684     22.0824     -0.8352643207414
Benchmarks/mediabench/g721/g721encode/encod  0.0403      0.04        -0.7444168734491
Benchmarks/ASC_Sequoia/AMGmk/AMGmk           5.0381      5.0033      -0.6907365872054
Benchmarks/TSVC/GlobalDataFlow-dbl/GlobalDa  2.3246      2.3089      -0.6753850124752
Applications/sgefa/sgefa                     0.0962      0.0956      -0.6237006237006
Applications/minisat/minisat                 4.021       4.0023      -0.4650584431733
Benchmarks/llubenchmark/llu                  2.8277      2.8147      -0.4597375959260
Benchmarks/TSVC/Expansion-flt/Expansion-flt  1.8036      1.7961      -0.4158349966733
Applications/aha/aha                         1.1345      1.1299      -0.4054649625385
Benchmarks/TSVC/Expansion-dbl/Expansion-dbl  2.5986      2.5886      -0.3848225967828
Benchmarks/PAQ8p/paq8p                       33.6364     33.5149     -0.3612158257126
Benchmarks/FreeBench/neural/neural           0.1771      0.1765      -0.3387916431394
Benchmarks/Ptrdist/ft/ft                     0.6569      0.6549      -0.3044603440401
Benchmarks/Trimaran/enc-3des/enc-3des        1.3386      1.3354      -0.2390557298670
Benchmarks/VersaBench/ecbdes/ecbdes          1.5638      1.5623      -0.0959201943982
Benchmarks/TSVC/Recurrences-dbl/Recurrences  2.8128      2.8102      -0.0924345847554
Benchmarks/Trimaran/netbench-crc/netbench-c  0.5665      0.566       -0.0882612533098
Benchmarks/Prolangs-C++/life/life            1.826       1.8244      -0.0876232201533
Benchmarks/TSVC/ControlFlow-dbl/ControlFlow  2.6993      2.6973      -0.0740932834438
Benchmarks/TSVC/Packing-flt/Packing-flt      2.6722      2.6716      -0.0224534091759
Benchmarks/TSVC/Searching-flt/Searching-flt  3.3246      3.324       -0.0180472838837
Benchmarks/TSVC/Searching-dbl/Searching-dbl  3.3563      3.3558      -0.0148973572088
Benchmarks/TSVC/Equivalencing-flt/Equivalen  0.9735      0.9734      -0.0102722136620
Applications/Burg/burg                       0.0008      0.0008      0.0             
Applications/hbd/hbd                         0.0018      0.0018      0.0             
Benchmarks/BitBench/uudecode/uudecode        0.0243      0.0243      0.0             
Benchmarks/McCat/04-bisect/bisect            0.0696      0.0696      0.0             
Benchmarks/McCat/05-eks/eks                  0.0021      0.0021      0.0             
Benchmarks/McCat/15-trie/trie                0.0008      0.0008      0.0             
Benchmarks/MiBench/consumer-jpeg/consumer-j  0.0028      0.0028      0.0             
Benchmarks/MiBench/office-ispell/office-isp  0.0006      0.0006      0.0             
Benchmarks/MiBench/security-blowfish/securi  0.0007      0.0007      0.0             
Benchmarks/MiBench/telecomm-adpcm/telecomm-  0.0006      0.0006      0.0             
Benchmarks/Prolangs-C++/NP/np                0.0006      0.0006      0.0             
Benchmarks/Prolangs-C++/deriv1/deriv1        0.0006      0.0006      0.0             
Benchmarks/Prolangs-C++/deriv2/deriv2        0.0006      0.0006      0.0             
Benchmarks/Prolangs-C++/employ/employ        0.0038      0.0038      0.0             
Benchmarks/Prolangs-C++/fsm/fsm              0.0005      0.0005      0.0             
Benchmarks/Prolangs-C++/garage/garage        0.0006      0.0006      0.0             
Benchmarks/Prolangs-C++/ocean/ocean          0.042       0.042       0.0             
Benchmarks/Prolangs-C++/office/office        0.0006      0.0006      0.0             
Benchmarks/Prolangs-C++/trees/trees          0.0006      0.0006      0.0             
Benchmarks/Prolangs-C++/vcirc/vcirc          0.0005      0.0005      0.0             
Benchmarks/Prolangs-C/allroots/allroots      0.0006      0.0006      0.0             
Benchmarks/Prolangs-C/compiler/compiler      0.0006      0.0006      0.0             
Benchmarks/Prolangs-C/fixoutput/fixoutput    0.0006      0.0006      0.0             
Benchmarks/Prolangs-C/football/football      0.0005      0.0005      0.0             
Benchmarks/Prolangs-C/loader/loader          0.0006      0.0006      0.0             
Benchmarks/Prolangs-C/simulator/simulator    0.0006      0.0006      0.0             
Benchmarks/Prolangs-C/unix-tbl/unix-tbl      0.0006      0.0006      0.0             
Benchmarks/TSVC/Recurrences-flt/Recurrences  2.7172      2.7173      0.00368025909023
Benchmarks/TSVC/StatementReordering-dbl/Sta  2.5547      2.555       0.01174306180765
Benchmarks/Trimaran/enc-md5/enc-md5          1.2119      1.2126      0.05776054129878
Benchmarks/MiBench/automotive-basicmath/aut  0.1698      0.1699      0.05889281507655
Benchmarks/ASC_Sequoia/IRSmk/IRSmk           2.6607      2.6626      0.07140977938136
Benchmarks/Fhourstones-3.1/fhourstones3.1    0.7427      0.7433      0.08078632018310
Benchmarks/TSVC/LoopRestructuring-dbl/LoopR  2.9857      2.9883      0.08708175637204
Benchmarks/Olden/em3d/em3d                   2.0241      2.0262      0.10374981473247
Benchmarks/TSVC/LoopRerolling-flt/LoopRerol  2.0889      2.0914      0.11968021446694
Benchmarks/TSVC/Packing-dbl/Packing-dbl      2.8154      2.8196      0.14917951268025
Benchmarks/BitBench/five11/five11            4.038       4.0448      0.16840019811788
Benchmarks/Olden/treeadd/treeadd             0.1588      0.1591      0.18891687657430
Benchmarks/TSVC/IndirectAddressing-flt/Indi  2.1573      2.1615      0.19468780419969
Benchmarks/Ptrdist/anagram/anagram           0.6629      0.6644      0.22627847337455
Benchmarks/TSVC/StatementReordering-flt/Sta  1.8867      1.892       0.28091376477446
Benchmarks/TSVC/IndirectAddressing-dbl/Indi  2.6113      2.6189      0.29104277562899
Benchmarks/FreeBench/pifft/pifft             0.0636      0.0638      0.31446540880501
Benchmarks/Prolangs-C++/primes/primes        0.1916      0.1923      0.36534446764092
Benchmarks/TSVC/GlobalDataFlow-flt/GlobalDa  1.2514      1.2567      0.42352565127056
Benchmarks/Olden/power/power                 0.7097      0.7129      0.45089474425813
Benchmarks/ASCI_Purple/SMG2000/smg2000       1.4904      1.4972      0.45625335480408
Applications/lemon/lemon                     0.6774      0.6805      0.45763212282255
Benchmarks/MiBench/telecomm-FFT/telecomm-ff  0.0209      0.021       0.47846889952154
Benchmarks/7zip/7zip-benchmark               5.9521      5.9811      0.48722299692545
Benchmarks/TSVC/CrossingThresholds-dbl/Cros  2.6449      2.6578      0.48773110514575
Applications/SPASS/SPASS                     5.9442      5.9748      0.51478752397294
Benchmarks/MallocBench/cfrac/cfrac           1.2635      1.2704      0.54610209734862
Benchmarks/Ptrdist/ks/ks                     0.7054      0.7117      0.89311029203288
Benchmarks/MallocBench/espresso/espresso     0.3836      0.3871      0.91240875912408
Applications/JM/lencod/lencod                3.7442      3.7859      1.11372255755568
Benchmarks/TSVC/Equivalencing-dbl/Equivalen  1.3717      1.3881      1.19559670481884
Benchmarks/Olden/bh/bh                       0.6255      0.633       1.1990407673861 
Benchmarks/VersaBench/8b10b/8b10b            2.8968      2.9416      1.5465341066004 
Benchmarks/BitBench/drop3/drop3              0.174       0.1768      1.60919540229886
Benchmarks/McCat/12-IOtest/iotest            0.1223      0.1243      1.63532297628781
Applications/spiff/spiff                     1.629       1.6558      1.6451810926949 
Benchmarks/TSVC/CrossingThresholds-flt/Cros  2.0682      2.1028      1.67295232569383
Benchmarks/Olden/voronoi/voronoi             0.1569      0.1596      1.72084130019119
Applications/lua/lua                         14.0101     14.2671     1.83439090370518
Benchmarks/nbench/nbench                     5.4638      5.568       1.90709762436399
Applications/sqlite3/sqlite3                 2.3871      2.4339      1.960537891165  
Applications/ALAC/decode/alacconvert-decode  0.0152      0.0155      1.97368421052632
Benchmarks/Trimaran/netbench-url/netbench-u  2.7548      2.8112      2.04733555975025
Benchmarks/Olden/bisort/bisort               0.3265      0.3332      2.05206738131699
Benchmarks/Fhourstones/fhourstones           0.6284      0.6419      2.14831317632083
Applications/JM/ldecod/ldecod                0.0543      0.0556      2.39410681399631
Benchmarks/TSVC/LoopRestructuring-flt/LoopR  2.2302      2.2848      2.4482109227872 
Benchmarks/FreeBench/mason/mason             0.1085      0.1113      2.58064516129032
Benchmarks/Bullet/bullet                     3.0174      3.0968      2.63140452044807
Applications/SIBsim4/SIBsim4                 1.8364      1.8853      2.66281855804835
Benchmarks/McCat/08-main/main                0.0138      0.0142      2.89855072463769
Applications/siod/siod                       1.8991      1.9696      3.71228476646833
Benchmarks/FreeBench/distray/distray         0.0793      0.0829      4.53972257250947
Benchmarks/NPB-serial/is/is                  4.6101      4.8299      4.76779245569511
Applications/kimwitu++/kc                    0.0266      0.0279      4.88721804511279
Benchmarks/Olden/mst/mst                     0.0551      0.0589      6.89655172413793
Benchmarks/Ptrdist/yacr2/yacr2               0.5277      0.5663      7.31476217547851
Benchmarks/VersaBench/beamformer/beamformer  0.6497      0.7015      7.97291057411112
Benchmarks/sim/sim                           2.6061      2.8147      8.00429760945475
Benchmarks/FreeBench/pcompress2/pcompress2   0.101       0.1097      8.61386138613861
Benchmarks/mafft/pairlocalalign              16.7374     18.4048     9.9621207594967 
Benchmarks/MiBench/office-stringsearch/offi  0.001       0.0011      10.0            
Benchmarks/TSVC/InductionVariable-flt/Induc  2.0788      2.2966      10.4771983836829
Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2d  0.0076      0.0084      10.5263157894737
Benchmarks/MiBench/consumer-typeset/consume  0.0943      0.1053      11.6648992576882
Benchmarks/tramp3d-v4/tramp3d-v4             0.1849      0.208       12.493239588967 
Benchmarks/Prolangs-C++/objects/objects      0.0005      0.0006      20.0            
Benchmarks/Prolangs-C/TimberWolfMC/timberwo  0.0005      0.0006      20.0            
Benchmarks/Prolangs-C/assembler/assembler    0.0005      0.0006      20.0            
-------------- next part --------------
Index: include/llvm/Transforms/IPO/PassManagerBuilder.h
===================================================================
--- include/llvm/Transforms/IPO/PassManagerBuilder.h	(revision 187135)
+++ include/llvm/Transforms/IPO/PassManagerBuilder.h	(working copy)
@@ -132,8 +132,14 @@
 
   /// populateModulePassManager - This sets up the primary pass manager.
   void populateModulePassManager(PassManagerBase &MPM);
-  void populateLTOPassManager(PassManagerBase &PM, bool Internalize,
-                              bool RunInliner, bool DisableGVNLoadPRE = false);
+
+  /// setup passes for Pre-IPO phase
+  void populatePreIPOPassMgr(PassManagerBase &MPM);
+
+  void populateIPOPassManager(PassManagerBase &PM, bool Internalize,
+                              bool RunInliner);
+
+  void populatePostIPOPM(PassManagerBase &PM);
 };
 
 /// Registers a function for adding a standard set of passes.  This should be
Index: include/llvm/Transforms/IPO.h
===================================================================
--- include/llvm/Transforms/IPO.h	(revision 187135)
+++ include/llvm/Transforms/IPO.h	(working copy)
@@ -89,6 +89,7 @@
 /// threshold given here.
 Pass *createFunctionInliningPass();
 Pass *createFunctionInliningPass(int Threshold);
+Pass *createTinyFuncInliningPass();
 
 //===----------------------------------------------------------------------===//
 /// createAlwaysInlinerPass - Return a new pass object that inlines only 
Index: tools/lto/LTOCodeGenerator.cpp
===================================================================
--- tools/lto/LTOCodeGenerator.cpp	(revision 187135)
+++ tools/lto/LTOCodeGenerator.cpp	(working copy)
@@ -412,11 +412,12 @@
   // Enabling internalize here would use its AllButMain variant. It
   // keeps only main if it exists and does nothing for libraries. Instead
   // we create the pass ourselves with the symbol list provided by the linker.
-  if (!DisableOpt)
-    PassManagerBuilder().populateLTOPassManager(passes,
-                                              /*Internalize=*/false,
-                                              !DisableInline,
-                                              DisableGVNLoadPRE);
+  if (!DisableOpt) {
+    PassManagerBuilder().populateIPOPassManager(passes,
+                                                /*Internalize=*/false,
+                                                !DisableInline);
+    PassManagerBuilder().populatePostIPOPM(passes);
+ }
 
   // Make sure everything is still good.
   passes.add(createVerifierPass());
Index: tools/opt/opt.cpp
===================================================================
--- tools/opt/opt.cpp	(revision 187135)
+++ tools/opt/opt.cpp	(working copy)
@@ -104,6 +104,11 @@
                    cl::desc("Include the standard compile time optimizations"));
 
 static cl::opt<bool>
+StandardPreIPOOpts("std-preipo-opts",
+                   cl::desc("Include the standard pre-IPO optimizations"));
+
+
+static cl::opt<bool>
 StandardLinkOpts("std-link-opts",
                  cl::desc("Include the standard link time optimizations"));
 
@@ -470,6 +475,23 @@
   Builder.populateModulePassManager(PM);
 }
 
+static void AddPreIPOCompilePasses(PassManagerBase &PM) {
+  PM.add(createVerifierPass());                  // Verify that input is correct
+
+  // If the -strip-debug command line option was specified, do it.
+  if (StripDebug)
+    addPass(PM, createStripSymbolsPass(true));
+
+  if (DisableOptimizations) return;
+
+  // -std-preipo-opts adds the same module passes as -O3.
+  PassManagerBuilder Builder;
+  if (!DisableInline)
+    Builder.Inliner = createTinyFuncInliningPass();
+  Builder.OptLevel = 3;
+  Builder.populatePreIPOPassMgr(PM);
+}
+
 static void AddStandardLinkPasses(PassManagerBase &PM) {
   PM.add(createVerifierPass());                  // Verify that input is correct
 
@@ -480,8 +502,9 @@
   if (DisableOptimizations) return;
 
   PassManagerBuilder Builder;
-  Builder.populateLTOPassManager(PM, /*Internalize=*/ !DisableInternalize,
+  Builder.populateIPOPassManager(PM, /*Internalize=*/ !DisableInternalize,
                                  /*RunInliner=*/ !DisableInline);
+  Builder.populatePostIPOPM(PM);
 }
 
 //===----------------------------------------------------------------------===//
@@ -778,6 +801,12 @@
     StandardCompileOpts = false;
   }
 
+  // If -std-preipo-opts was specified at the end of the pass list, add them.
+  if (StandardPreIPOOpts) {
+    AddPreIPOCompilePasses(Passes);
+    StandardPreIPOOpts = false;
+  }
+
   if (StandardLinkOpts) {
     AddStandardLinkPasses(Passes);
     StandardLinkOpts = false;
Index: tools/bugpoint/bugpoint.cpp
===================================================================
--- tools/bugpoint/bugpoint.cpp	(revision 187135)
+++ tools/bugpoint/bugpoint.cpp	(working copy)
@@ -169,8 +169,9 @@
       
   if (StandardLinkOpts) {
     PassManagerBuilder Builder;
-    Builder.populateLTOPassManager(PM, /*Internalize=*/true,
+    Builder.populateIPOPassManager(PM, /*Internalize=*/true,
                                    /*RunInliner=*/true);
+    Builder.populatePostIPOPM(PM);
   }
 
   if (OptLevelO1 || OptLevelO2 || OptLevelO3) {
Index: lib/Transforms/IPO/PassManagerBuilder.cpp
===================================================================
--- lib/Transforms/IPO/PassManagerBuilder.cpp	(revision 187135)
+++ lib/Transforms/IPO/PassManagerBuilder.cpp	(working copy)
@@ -294,10 +294,78 @@
   addExtensionsToPM(EP_OptimizerLast, MPM);
 }
 
-void PassManagerBuilder::populateLTOPassManager(PassManagerBase &PM,
+void PassManagerBuilder::populatePreIPOPassMgr(PassManagerBase &MPM) {
+  // If all optimizations are disabled, just run the always-inline pass.
+  if (OptLevel == 0) {
+    if (Inliner) {
+      MPM.add(Inliner);
+      Inliner = 0;
+    }
+    return;
+  }
+
+  bool EnableLightWeightIPO = (OptLevel > 1);
+
+  // Add LibraryInfo if we have some.
+  if (LibraryInfo) MPM.add(new TargetLibraryInfo(*LibraryInfo));
+  addInitialAliasAnalysisPasses(MPM);
+
+  // Start of CallGraph SCC passes.
+  {
+    if (EnableLightWeightIPO) {
+      MPM.add(createPruneEHPass());             // Remove dead EH info
+      if (Inliner) {
+        MPM.add(Inliner);
+        Inliner = 0;
+      }
+      MPM.add(createArgumentPromotionPass());   // Scalarize uninlined fn args
+    }
+  
+    // Start of function pass.
+    {
+      if (UseNewSROA)
+        MPM.add(createSROAPass(/*RequiresDomTree*/ false));
+      else
+        MPM.add(createScalarReplAggregatesPass(-1, false));
+    
+      MPM.add(createEarlyCSEPass());              // Catch trivial redundancies
+      MPM.add(createJumpThreadingPass());         // Thread jumps.
+      MPM.add(createCorrelatedValuePropagationPass());// Propagate conditionals
+      MPM.add(createCFGSimplificationPass());     // Merge & remove BBs
+      MPM.add(createInstructionCombiningPass());  // Combine silly seq's
+      MPM.add(createReassociatePass());           // Reassociate expressions
+      MPM.add(createLoopRotatePass());            // Rotate Loop
+      MPM.add(createLICMPass());                  // Hoist loop invariants
+      MPM.add(createIndVarSimplifyPass());        // Canonicalize indvars
+      MPM.add(createLoopIdiomPass());             // Recognize idioms like memset.
+      MPM.add(createLoopDeletionPass());          // Delete dead loops
+    
+      MPM.add(createGVNPass());                   // Remove redundancies
+      MPM.add(createMemCpyOptPass());             // Remove memcpy / form memset
+      MPM.add(createSCCPPass());                  // Constant prop with SCCP
+    
+      MPM.add(createDeadStoreEliminationPass());  // Delete dead stores
+      MPM.add(createAggressiveDCEPass());         // Delete dead instructions
+      MPM.add(createFunctionAttrsPass());         // Set readonly/readnone attrs
+    
+      MPM.add(createTailCallEliminationPass());   // Eliminate tail calls
+    }
+
+    // End of CallGraph SCC passes.
+  }
+
+  if (EnableLightWeightIPO) {
+    MPM.add(createGlobalOptimizerPass());     // Optimize out global vars
+    MPM.add(createIPSCCPPass());              // IP SCCP
+    MPM.add(createDeadArgEliminationPass());  // Dead argument elimination
+    MPM.add(createGlobalDCEPass());         // Remove dead fns and globals.
+    MPM.add(createConstantMergePass());     // Merge dup global constants
+  }
+}
+
+void PassManagerBuilder::populateIPOPassManager(PassManagerBase &PM,
                                                 bool Internalize,
-                                                bool RunInliner,
-                                                bool DisableGVNLoadPRE) {
+                                                bool RunInliner) {
   // Provide AliasAnalysis services for optimizations.
   addInitialAliasAnalysisPasses(PM);
 
@@ -325,15 +393,9 @@
   // Remove unused arguments from functions.
   PM.add(createDeadArgEliminationPass());
 
-  // Reduce the code after globalopt and ipsccp.  Both can open up significant
-  // simplification opportunities, and both can propagate functions through
-  // function pointers.  When this happens, we often have to resolve varargs
-  // calls, etc, so let instcombine do this.
-  PM.add(createInstructionCombiningPass());
-
   // Inline small functions
   if (RunInliner)
-    PM.add(createFunctionInliningPass());
+    PM.add(createFunctionInliningPass(255));
 
   PM.add(createPruneEHPass());   // Remove dead EH info.
 
@@ -346,35 +408,98 @@
   // transform it to pass arguments by value instead of by reference.
   PM.add(createArgumentPromotionPass());
 
-  // The IPO passes may leave cruft around.  Clean up after them.
-  PM.add(createInstructionCombiningPass());
-  PM.add(createJumpThreadingPass());
-  // Break up allocas
-  if (UseNewSROA)
-    PM.add(createSROAPass());
-  else
-    PM.add(createScalarReplAggregatesPass());
-
   // Run a few AA driven optimizations here and now, to cleanup the code.
   PM.add(createFunctionAttrsPass()); // Add nocapture.
   PM.add(createGlobalsModRefPass()); // IP alias analysis.
+}
 
-  PM.add(createLICMPass());                 // Hoist loop invariants.
-  PM.add(createGVNPass(DisableGVNLoadPRE)); // Remove redundancies.
-  PM.add(createMemCpyOptPass());            // Remove dead memcpys.
-  // Nuke dead stores.
-  PM.add(createDeadStoreEliminationPass());
+void PassManagerBuilder::populatePostIPOPM(PassManagerBase &PM) {
+  // In PostIPO phase, the choice for inlining is simple: either no inlining at
+  // all or just run the inliner which only inline tiny functions. This function
+  // has freedom to pick up which choice is more appropriate.
+  //
+  assert(Inliner == 0 && "Don't specify inliner");
+  if (OptLevel == 0)
+    return;
 
-  // Cleanup and simplify the code after the scalar optimizations.
-  PM.add(createInstructionCombiningPass());
+  bool EnableLightWeightIPO = (OptLevel > 1);
 
-  PM.add(createJumpThreadingPass());
+  // Add LibraryInfo if we have some.
+  if (LibraryInfo) PM.add(new TargetLibraryInfo(*LibraryInfo));
 
-  // Delete basic blocks, which optimization passes may have killed.
-  PM.add(createCFGSimplificationPass());
+  addInitialAliasAnalysisPasses(PM);
 
-  // Now that we have optimized the program, discard unreachable functions.
-  PM.add(createGlobalDCEPass());
+  // Start of CallGraph SCC passes.
+  {
+    if (EnableLightWeightIPO) {
+      PM.add(createTinyFuncInliningPass());
+      PM.add(createFunctionAttrsPass());       // Set readonly/readnone attrs
+    }
+
+    // Start of function pass.
+    {
+      PM.add(createMemCpyOptPass());             // Remove memcpy / form memset
+      if (UseNewSROA)
+        PM.add(createSROAPass(/*RequiresDomTree*/ false));
+      else
+        PM.add(createScalarReplAggregatesPass(-1, false));
+      PM.add(createEarlyCSEPass());      // Catch trivial redundancies
+      PM.add(createSCCPPass());                  // Constant prop with SCCP
+      PM.add(createJumpThreadingPass());         // Thread jumps
+      PM.add(createCorrelatedValuePropagationPass()); // Propagate conditionals
+      PM.add(createCFGSimplificationPass());     // Merge & remove BBs
+      PM.add(createReassociatePass());           // Reassociate expressions
+      PM.add(createLoopRotatePass());            // Rotate Loop
+      PM.add(createLICMPass());                  // Hoist loop invariants
+      PM.add(createLoopUnswitchPass(SizeLevel || OptLevel < 3));
+      PM.add(createIndVarSimplifyPass());        // Canonicalize indvars
+      PM.add(createLoopIdiomPass());             // Recognize idioms like memset.
+      PM.add(createLoopDeletionPass());          // Delete dead loops
+
+      if (/*LoopVectorize &&*/ OptLevel > 1 && SizeLevel < 2)
+        PM.add(createLoopVectorizePass());
+
+      if (!DisableUnrollLoops)
+        PM.add(createLoopUnrollPass());          // Unroll small loops
+
+      addExtensionsToPM(EP_LoopOptimizerEnd, PM);
+
+      if (OptLevel > 1)
+        PM.add(createGVNPass());                 // Remove redundancies
+
+      PM.add(createInstructionCombiningPass());
+      PM.add(createDeadStoreEliminationPass());  // Delete dead stores
+      PM.add(createAggressiveDCEPass());         // Delete dead instructions
+      if (UseNewSROA)
+        PM.add(createSROAPass(/*RequiresDomTree*/ false));
+      else
+        PM.add(createScalarReplAggregatesPass(-1, false));
+
+      addExtensionsToPM(EP_ScalarOptimizerLate, PM);
+
+
+      // Add the various vectorization passes and relevant cleanup passes for
+      // them since we are no longer in the middle of the main scalar pipeline.
+      if (/*LoopVectorize && */OptLevel > 1 && SizeLevel < 2)
+        PM.add(createLoopVectorizePass());
+
+      #if 1
+      if (!DisableUnrollLoops)
+         PM.add(createLoopUnrollPass());    // Unroll small loops
+      #endif
+
+      PM.add(createInstructionCombiningPass());
+
+      if (SLPVectorize)
+        PM.add(createSLPVectorizerPass());   // Vectorize parallel scalar chains.
+    }
+  }
+
+  if (EnableLightWeightIPO) {
+    PM.add(createGlobalDCEPass());         // Remove dead fns and globals.
+    PM.add(createConstantMergePass());     // Merge dup global constants
+  }
+  addExtensionsToPM(EP_OptimizerLast, PM);
 }
 
 inline PassManagerBuilder *unwrap(LLVMPassManagerBuilderRef P) {
@@ -458,5 +583,6 @@
                                                   LLVMBool RunInliner) {
   PassManagerBuilder *Builder = unwrap(PMB);
   PassManagerBase *LPM = unwrap(PM);
-  Builder->populateLTOPassManager(*LPM, Internalize != 0, RunInliner != 0);
+  Builder->populateIPOPassManager(*LPM, Internalize != 0, RunInliner != 0);
+  Builder->populatePostIPOPM(*LPM);
 }
Index: lib/Transforms/IPO/InlineSimple.cpp
===================================================================
--- lib/Transforms/IPO/InlineSimple.cpp	(revision 187135)
+++ lib/Transforms/IPO/InlineSimple.cpp	(working copy)
@@ -72,6 +72,10 @@
   return new SimpleInliner(Threshold);
 }
 
+Pass *llvm::createTinyFuncInliningPass() {
+  return new SimpleInliner(40);
+}
+
 bool SimpleInliner::runOnSCC(CallGraphSCC &SCC) {
   ICA = &getAnalysis<InlineCostAnalysis>();
   return Inliner::runOnSCC(SCC);
-------------- next part --------------
Index: include/clang/Frontend/CodeGenOptions.def
===================================================================
--- include/clang/Frontend/CodeGenOptions.def	(revision 187135)
+++ include/clang/Frontend/CodeGenOptions.def	(working copy)
@@ -112,6 +112,7 @@
 CODEGENOPT(VectorizeBB       , 1, 0) ///< Run basic block vectorizer.
 CODEGENOPT(VectorizeLoop     , 1, 0) ///< Run loop vectorizer.
 CODEGENOPT(VectorizeSLP      , 1, 0) ///< Run SLP vectorizer.
+CODEGENOPT(IsPreIPO          , 1, 0) ///< Indicate in pre-IPO phase
 
   /// Attempt to use register sized accesses to bit-fields in structures, when
   /// possible.
Index: include/clang/Driver/CC1Options.td
===================================================================
--- include/clang/Driver/CC1Options.td	(revision 187135)
+++ include/clang/Driver/CC1Options.td	(working copy)
@@ -210,6 +210,8 @@
   HelpText<"Run the SLP vectorization passes">;
 def vectorize_slp_aggressive : Flag<["-"], "vectorize-slp-aggressive">,
   HelpText<"Run the BB vectorization passes">;
+def preipo : Flag<["-"], "preipo">,
+  HelpText<"Run the pre-IPO passes">;
 
 //===----------------------------------------------------------------------===//
 // Dependency Output Options
Index: lib/Frontend/CompilerInvocation.cpp
===================================================================
--- lib/Frontend/CompilerInvocation.cpp	(revision 187135)
+++ lib/Frontend/CompilerInvocation.cpp	(working copy)
@@ -402,6 +402,7 @@
   Opts.VectorizeBB = Args.hasArg(OPT_vectorize_slp_aggressive);
   Opts.VectorizeLoop = Args.hasArg(OPT_vectorize_loops);
   Opts.VectorizeSLP = Args.hasArg(OPT_vectorize_slp);
+  Opts.IsPreIPO = Args.hasArg(OPT_preipo);
 
   Opts.MainFileName = Args.getLastArgValue(OPT_main_file_name);
   Opts.VerifyModule = !Args.hasArg(OPT_disable_llvm_verifier);
Index: lib/Driver/Tools.cpp
===================================================================
--- lib/Driver/Tools.cpp	(revision 187135)
+++ lib/Driver/Tools.cpp	(working copy)
@@ -2014,7 +2014,8 @@
       CmdArgs.push_back("-emit-pth");
   } else {
     assert(isa<CompileJobAction>(JA) && "Invalid action for clang tool.");
-
+    if (D.IsUsingLTO(Args))
+      CmdArgs.push_back("-preipo");
     if (JA.getType() == types::TY_Nothing) {
       CmdArgs.push_back("-fsyntax-only");
     } else if (JA.getType() == types::TY_LLVM_IR ||
Index: lib/CodeGen/BackendUtil.cpp
===================================================================
--- lib/CodeGen/BackendUtil.cpp	(revision 187135)
+++ lib/CodeGen/BackendUtil.cpp	(working copy)
@@ -274,6 +274,10 @@
   switch (Inlining) {
   case CodeGenOptions::NoInlining: break;
   case CodeGenOptions::NormalInlining: {
+    if (CodeGenOpts.IsPreIPO) {
+      PMBuilder.Inliner = createTinyFuncInliningPass();
+      break;
+    }
     // FIXME: Derive these constants in a principled fashion.
     unsigned Threshold = 225;
     if (CodeGenOpts.OptimizeSize == 1)      // -Os
@@ -321,7 +325,10 @@
       MPM->add(createStripSymbolsPass(true));
   }
 
-  PMBuilder.populateModulePassManager(*MPM);
+  if (!CodeGenOpts.IsPreIPO) 
+    PMBuilder.populateModulePassManager(*MPM);
+  else
+    PMBuilder.populatePreIPOPassMgr(*MPM);
 }
 
 TargetMachine *EmitAssemblyHelper::CreateTargetMachine(bool MustCreateTM) {


More information about the llvm-dev mailing list