[llvm] [polly] [SimpleLoopUnswitch] Don't use BlockFrequencyInfo to skip cold loops (PR #159522)
Luke Lau via llvm-commits
llvm-commits at lists.llvm.org
Tue Sep 23 10:11:31 PDT 2025
lukel97 wrote:
@fhahn @teresajohnson I did some more experimenting and I think the usage of lossy BFI is a bigger problem than I had initially expected.
I changed SimpeLoopUnswitch to always invalidate BlockFrequencyAnalysis and BranchProbabilityAnalysis beforehand (not including other analyses), so that the BFI data is always fresh. On arm64-apple-darwin -O3 there's a large difference in the number of "cold" loops skipped, specifically a lot that previously were considered cold:
```
lhs = lossy BFI
rhs = correct BFI
Program simple-loop-unswitch.NumColdSkipped
lhs rhs diff
MultiSourc...ch/consumer-lame/consumer-lame 283.00 282.00 -0.4%
MultiSourc...sumer-typeset/consumer-typeset 571.00 564.00 -1.2%
MultiSource/Applications/siod/siod 176.00 173.00 -1.7%
MultiSourc...e/Applications/SIBsim4/SIBsim4 56.00 55.00 -1.8%
MultiSourc.../Applications/JM/lencod/lencod 1247.00 1220.00 -2.2%
MultiSourc...arks/DOE-ProxyApps-C/CoMD/CoMD 75.00 73.00 -2.7%
MultiSourc...xyApps-C/Pathfinder/PathFinder 65.00 63.00 -3.1%
MultiSourc...OE-ProxyApps-C/XSBench/XSBench 20.00 19.00 -5.0%
MultiSource/Applications/d/make_dparser 153.00 143.00 -6.5%
MultiSource/Benchmarks/nbench/nbench 82.00 76.00 -7.3%
MultiSource/Applications/oggenc/oggenc 472.00 434.00 -8.1%
MultiSourc...hmarks/MallocBench/cfrac/cfrac 23.00 21.00 -8.7%
MultiSourc...Benchmarks/SciMark2-C/scimark2 32.00 29.00 -9.4%
MultiSourc...hmarks/ASC_Sequoia/AMGmk/AMGmk 89.00 80.00 -10.1%
MultiSourc...rsaBench/beamformer/beamformer 19.00 17.00 -10.5%
MultiSourc...e/Applications/obsequi/Obsequi 70.00 62.00 -11.4%
MultiSource/Benchmarks/McCat/18-imp/imp 30.00 26.00 -13.3%
MultiSourc...e-bitcount/automotive-bitcount 7.00 6.00 -14.3%
MultiSource/Benchmarks/sim/sim 7.00 6.00 -14.3%
MultiSourc...SC_Sequoia/CrystalMk/CrystalMk 12.00 10.00 -16.7%
MultiSource/Applications/sgefa/sgefa 48.00 40.00 -16.7%
MultiSourc...OE-ProxyApps-C/RSBench/rsbench 6.00 5.00 -16.7%
MultiSourc...Benchmarks/Ptrdist/yacr2/yacr2 35.00 28.00 -20.0%
MultiSourc...marks/Prolangs-C/bison/mybison 75.00 60.00 -20.0%
MultiSource/Benchmarks/Olden/mst/mst 15.00 11.00 -26.7%
MultiSourc.../Rodinia/pathfinder/pathfinder 11.00 8.00 -27.3%
MultiSource/Benchmarks/Olden/em3d/em3d 20.00 14.00 -30.0%
MultiSourc.../Benchmarks/VersaBench/bmm/bmm 26.00 18.00 -30.8%
MultiSourc...hmarks/FreeBench/neural/neural 28.00 19.00 -32.1%
MultiSourc...enchmarks/VersaBench/dbms/dbms 6.00 4.00 -33.3%
MultiSourc.../telecomm-CRC32/telecomm-CRC32 3.00 2.00 -33.3%
MultiSourc...chmarks/VersaBench/8b10b/8b10b 5.00 3.00 -40.0%
MultiSource/Benchmarks/McCat/09-vor/vor 2.00 1.00 -50.0%
MultiSourc...chmarks/McCat/04-bisect/bisect 6.00 3.00 -50.0%
MultiSource/Benchmarks/Ptrdist/ks/ks 8.00 4.00 -50.0%
MultiSourc...Benchmarks/Olden/health/health 2.00 1.00 -50.0%
MultiSourc...marks/Trimaran/enc-rc4/enc-rc4 16.00 8.00 -50.0%
MultiSourc...rks/Trimaran/enc-3des/enc-3des 12.00 3.00 -75.0%
MultiSourc...rks/BitBench/uuencode/uuencode 5.00 1.00 -80.0%
MultiSourc...arks/FreeBench/distray/distray 2.00 0.00 -100.0%
MultiSourc...reeBench/pcompress2/pcompress2 1.00 0.00 -100.0%
```
This translates to a significant number of loops that we are erroneously skipping because we mistakenly think they are cold, **4.3% geomean overall**.
```
Metric: simple-loop-unswitch.NumBranches
Program simple-loop-unswitch.NumBranches
lhs rhs diff
Applications/aha/aha 1.00 2.00 100.0%
Benchmarks/McCat/09-vor/vor 1.00 2.00 100.0%
Benchmarks/VersaBench/bmm/bmm 6.00 11.00 83.3%
Benchmarks/ASC_Sequoia/IRSmk/IRSmk 2.00 3.00 50.0%
Benchmarks/nbench/nbench 4.00 6.00 50.0%
Benchmarks/ASC_Sequoia/AMGmk/AMGmk 11.00 16.00 45.5%
Benchmarks...sumer-typeset/consumer-typeset 10.00 14.00 40.0%
Benchmarks...iabench/g721/g721encode/encode 3.00 4.00 33.3%
Benchmarks.../Rodinia/pathfinder/pathfinder 3.00 4.00 33.3%
Benchmarks/DOE-ProxyApps-C/CoMD/CoMD 13.00 17.00 30.8%
Benchmarks/McCat/18-imp/imp 13.00 17.00 30.8%
Benchmarks/FreeBench/neural/neural 10.00 13.00 30.0%
Benchmarks...nch/mpeg2/mpeg2dec/mpeg2decode 4.00 5.00 25.0%
Benchmarks/Trimaran/enc-3des/enc-3des 4.00 5.00 25.0%
Benchmarks.../DOE-ProxyApps-C++/HPCCG/HPCCG 4.00 5.00 25.0%
Geomean difference 4.3%
```
For this reason alone I think we should remove the use of BlockFrequencyInfo as it's very likely that it's impeding performance. In the above results, the increase in loops unswitched all come from hot loops.
https://github.com/llvm/llvm-project/pull/159522
More information about the llvm-commits
mailing list