[PATCH] D105700: [LoopSimplify] Convert loop with multiple latches to nested loop using dominator tree
JinGu Kang via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Jul 13 03:50:42 PDT 2021
jaykang10 added a comment.
In D105700#2872907 <https://reviews.llvm.org/D105700#2872907>, @efriedma wrote:
> The transforms here is clearly correct... the part I'm not sure about is whether it's profitable in general. I'm particularly worried about cases where we make PHI nodes in the outer loop more difficult to analyze. Have you don't any experiments to try to determine the performance impact?
I have checked the performance number from below benchmarks.
llvm-test-suite for x86
Tests: 2939
Short Running: 2374 (filtered out)
Remaining: 565
Metric: exec_time
Program results.org results.multi.latches diff
test-suite...emCmp<8, GreaterThanZero, Mid> 601.59 1056.69 75.7%
test-suite...emCmp<15, LessThanZero, First> 299.16 517.11 72.9%
test-suite...Cmp<8, GreaterThanZero, First> 577.34 902.53 56.3%
test-suite...Cmp<31, GreaterThanZero, None> 410.89 600.97 46.3%
test-suite...MemCmp<15, LessThanZero, None> 453.93 658.92 45.2%
test-suite...aw.test:BM_MAT_X_MAT_RAW/44217 391924.61 536335.61 36.8%
test-suite...sCRaw.test:BM_HYDRO_2D_RAW/171 8.22 10.95 33.2%
test-suite...Cmp<31, GreaterThanZero, Last> 420.00 553.94 31.9%
test-suite...t:BM_MemCmp<31, EqZero, First> 155.73 204.14 31.1%
test-suite...Source/Benchmarks/sim/sim.test 2.12 2.78 30.7%
test-suite...lications/sqlite3/sqlite3.test 1.70 2.12 25.0%
test-suite....test:BENCHMARK_HARRIS/512/512 2272.75 2806.47 23.5%
test-suite...tions/lambda-0.1.3/lambda.test 2.09 2.57 22.6%
test-suite...s-C/Pathfinder/PathFinder.test 1.66 2.02 21.9%
test-suite...C/Packing-flt/Packing-flt.test 1.64 2.00 21.7%
Geomean difference nan%
results.org results.multi.latches diff
count 564.000000 562.000000 561.000000
mean 2062.547024 2174.176751 0.007214
std 23151.613672 25917.675478 0.091953
min 0.614800 0.616300 -0.490838
25% 2.715425 2.744382 -0.007251
50% 98.130479 98.001753 -0.000128
75% 517.301815 525.848058 0.010050
max 391924.609000 536335.615000 0.756508
SPEC2006 for AArch64
Benchmark diff(%)
400.perlbench 0.361501846
401.bzip2 0.279018415
403.gcc -0.565577003
429.mcf -2.208262958
445.gobmk -0.31082178
456.hmmer -0.467206165
458.sjeng -0.235398032
462.libquantum -3.935842025
464.h264ref 0.057919804
471.omnetpp -1.779197968
473.astar -0.518484615
483.xalancbmk -0.36306218
SPEC2017 for AArch64
Benchmark diff(%)
500.perlbench_r 1.547117225
502.gcc_r -0.76960912
505.mcf_r 0
520.omnetpp_r -1.730885443
523.xalancbmk_r -0.697377016
525.x264_r 0.052501261
531.deepsjeng_r -0.150069504
541.leela_r -0.042836837
548.exchange2_r 0.011397094
557.xz_r -0.416533465
As you mentioned, the existing analyses could be failed with the cascaded phi nodes from outer loop. I was able to see a test in which `CanProveNotTakenFirstIteration` is failed with LICM pass.
In order to canonicalize the loop with multiple latches, normally, the LoopSimplify pass creates a new latch and connects the old latches to the new one. If the pass detects a certain condition of phi node in the loop with multiple latches, it converts the loop into a nested loop. Therefore, it could be a question about which one is better between a loop, which has multiple induction variables and conditional branch, and a nested loop.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D105700/new/
https://reviews.llvm.org/D105700
More information about the llvm-commits
mailing list