[PATCH] D115653: [DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type Legalizer.
Alexey Bataev via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Apr 5 13:12:49 PDT 2022
ABataev added a comment.
In D115653#3401997 <https://reviews.llvm.org/D115653#3401997>, @RKSimon wrote:
> Thanks @ABataev this looks like its almost there - please can you do a run of the test suite to see if there are any noteworthy changes?
Did some changes in the analysis. Still need to improve the split process and instead bisecting implement immediate splitting to the register sizes. This should fix the regressions.
Some results. Did not gather perf changes, the system is too busy, instead code size changes.
**march=native(Skylake), -O3+LTO**
Metric: size..text
Program results results0 diff
test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/CoMD.test 45610 45866 0.6%
test-suite :: SingleSource/Benchmarks/SmallPT/smallpt.test 5907 5939 0.5%
test-suite :: SingleSource/UnitTests/Vector/SSE/Vector-sse.stepfft.test 8166 8198 0.4%
test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test 201680 202256 0.3%
test-suite :: MultiSource/Benchmarks/McCat/12-IOtest/iotest.test 9155 9171 0.2%
test-suite :: SingleSource/Benchmarks/Adobe-C++/loop_unroll.test 441951 442523 0.1%
test-suite :: External/SPEC/CFP2006/447.dealII/447.dealII.test 589569 590305 0.1%
test-suite :: MultiSource/Applications/ClamAV/clamscan.test 585848 586296 0.1%
test-suite :: SingleSource/UnitTests/matrix-types-spec.test 220022 220118 0.0%
test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 1122554 1123018 0.0%
test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test 41667 41683 0.0%
test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test 41674 41690 0.0%
test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 405280 405360 0.0%
test-suite :: MultiSource/Applications/JM/lencod/lencod.test 901096 901256 0.0%
test-suite :: MultiSource/Applications/oggenc/oggenc.test 192635 192667 0.0%
test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 2116993 2117265 0.0%
test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 317150 317182 0.0%
test-suite :: MultiSource/Benchmarks/MallocBench/gs/gs.test 170427 170443 0.0%
test-suite :: MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000.test 236760 236776 0.0%
test-suite :: External/SPEC/CINT2006/400.perlbench/400.perlbench.test 1141289 1141353 0.0%
test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 1165233 1165297 0.0%
test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test 301095 301111 0.0%
test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 1422763 1422811 0.0%
test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 1422763 1422811 0.0%
test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test 2129557 2129605 0.0%
test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test 2129557 2129605 0.0%
test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12493380 12493572 0.0%
Regressions:
test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 611440 611408 -0.0%
test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 611440 611408 -0.0%
Some extra loads of the patterns, I do not expect significant perf regressions here.
test-suite :: MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a/cjpeg.test 100502 100486 -0.0%
test-suite :: MultiSource/Benchmarks/TSVC/ControlFlow-flt/ControlFlow-flt.test 107636 107604 -0.0%
test-suite :: MultiSource/Benchmarks/TSVC/ControlFlow-dbl/ControlFlow-dbl.test 107156 107124 -0.0%
test-suite :: MultiSource/Benchmarks/TSVC/GlobalDataFlow-dbl/GlobalDataFlow-dbl.test 103066 103034 -0.0%
test-suite :: MultiSource/Benchmarks/TSVC/LinearDependence-dbl/LinearDependence-dbl.test 102066 102034 -0.0%
test-suite :: MultiSource/Benchmarks/TSVC/Expansion-dbl/Expansion-dbl.test 101516 101484 -0.0%
test-suite :: MultiSource/Benchmarks/TSVC/ControlLoops-dbl/ControlLoops-dbl.test 100904 100872 -0.0%
test-suite :: MultiSource/Benchmarks/TSVC/Equivalencing-dbl/Equivalencing-dbl.test 100613 100581 -0.0%
test-suite :: MultiSource/Benchmarks/TSVC/CrossingThresholds-dbl/CrossingThresholds-dbl.test 100609 100577 -0.0%
test-suite :: MultiSource/Benchmarks/TSVC/LoopRestructuring-dbl/LoopRestructuring-dbl.test 100480 100448 -0.0%
test-suite :: MultiSource/Benchmarks/TSVC/Expansion-flt/Expansion-flt.test 100352 100320 -0.0%
test-suite :: MultiSource/Benchmarks/TSVC/InductionVariable-dbl/InductionVariable-dbl.test 100344 100312 -0.0%
test-suite :: MultiSource/Benchmarks/TSVC/LoopRestructuring-flt/LoopRestructuring-flt.test 100212 100180 -0.0%
test-suite :: MultiSource/Benchmarks/TSVC/NodeSplitting-dbl/NodeSplitting-dbl.test 98220 98188 -0.0%
test-suite :: MultiSource/Benchmarks/TSVC/Symbolics-dbl/Symbolics-dbl.test 98136 98104 -0.0%
test-suite :: MultiSource/Benchmarks/TSVC/IndirectAddressing-dbl/IndirectAddressing-dbl.test 98136 98104 -0.0%
test-suite :: MultiSource/Benchmarks/TSVC/Packing-dbl/Packing-dbl.test 97784 97752 -0.0%
test-suite :: MultiSource/Benchmarks/TSVC/LoopRerolling-dbl/LoopRerolling-dbl.test 97448 97416 -0.0%
test-suite :: MultiSource/Benchmarks/TSVC/LinearDependence-flt/LinearDependence-flt.test 96918 96886 -0.0%
test-suite :: MultiSource/Benchmarks/TSVC/Recurrences-dbl/Recurrences-dbl.test 96700 96668 -0.0%
test-suite :: MultiSource/Benchmarks/TSVC/StatementReordering-dbl/StatementReordering-dbl.test 96428 96396 -0.0%
test-suite :: MultiSource/Benchmarks/TSVC/CrossingThresholds-flt/CrossingThresholds-flt.test 96229 96197 -0.0%
test-suite :: MultiSource/Benchmarks/TSVC/Searching-dbl/Searching-dbl.test 95896 95864 -0.0%
test-suite :: MultiSource/Benchmarks/TSVC/InductionVariable-flt/InductionVariable-flt.test 94844 94812 -0.0%
test-suite :: MultiSource/Benchmarks/TSVC/ControlLoops-flt/ControlLoops-flt.test 94396 94364 -0.0%
test-suite :: MultiSource/Benchmarks/TSVC/Equivalencing-flt/Equivalencing-flt.test 94169 94137 -0.0%
test-suite :: MultiSource/Benchmarks/TSVC/NodeSplitting-flt/NodeSplitting-flt.test 91872 91840 -0.0%
test-suite :: MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt.test 91692 91660 -0.0%
test-suite :: MultiSource/Benchmarks/TSVC/IndirectAddressing-flt/IndirectAddressing-flt.test 91564 91532 -0.0%
test-suite :: MultiSource/Benchmarks/TSVC/Packing-flt/Packing-flt.test 91324 91292 -0.0%
test-suite :: MultiSource/Benchmarks/TSVC/Recurrences-flt/Recurrences-flt.test 90208 90176 -0.0%
test-suite :: MultiSource/Benchmarks/TSVC/StatementReordering-flt/StatementReordering-flt.test 90048 90016 -0.0%
test-suite :: MultiSource/Benchmarks/TSVC/Searching-flt/Searching-flt.test 89390 89358 -0.0%
test-suite :: MultiSource/Benchmarks/nbench/nbench.test 75643 75611 -0.0%
test-suite :: MultiSource/Benchmarks/TSVC/Reductions-dbl/Reductions-dbl.test 99871 99823 -0.0%
test-suite :: MultiSource/Benchmarks/TSVC/Reductions-flt/Reductions-flt.test 93429 93381 -0.1%
test-suite :: MultiSource/Benchmarks/TSVC/LoopRerolling-flt/LoopRerolling-flt.test 91676 91612 -0.1%
TSVC - inner loop body has less instructions, the preheader is larger because of movement of some patterns loading.
nbench - no significant changes
jpeg/jpeg-6a - no significant changes
**Generic, -O3+LTO**
Metric: size..text
test-suite :: MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a/cjpeg.test 88982 89094 0.1%
test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 1332828 1333276 0.0%
test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 1332828 1333276 0.0%
test-suite :: External/SPEC/CFP2006/447.dealII/447.dealII.test 502161 502177 0.0%
test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12015748 12015828 0.0%
test-suite :: External/SPEC/CFP2006/433.milc/433.milc.test 154226 154226 0.0%
The perf changes are about 1-2% perf improvements (433.milc has ~10% but I would not rely on it).
Regressions:
test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 348030 347966 -0.0%
test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test 102840 102632 -0.2%
test-suite :: SingleSource/UnitTests/Vector/SSE/Vector-sse.stepfft.test 3361 3345 -0.5%
Bullet - 3 extra instructions in the loop, need to improve final lowering but should not affect performance too much (2 unpkl are replaced by 4 shufps and 1 movq)
consumer-jpeg - no significant changes actually, just changed the order of shuffles after combining extractvector instructions.
Vector-sse.stepfft - 2 unpck replaced by 4 shufps, no perf changes expected
The updated patch will be uploaded later today.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D115653/new/
https://reviews.llvm.org/D115653
More information about the llvm-commits
mailing list