[PATCH] D100381: [RFC] Improve loop distribute cost model
Sanne Wouda via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Apr 13 05:56:54 PDT 2021
sanwou01 created this revision.
sanwou01 added reviewers: SjoerdMeijer, qcolombet, nikic, dmgreen, davide, fhahn, jdoerfert, lebedev.ri, anemet.
Herald added a subscriber: hiraditya.
sanwou01 requested review of this revision.
Herald added a project: LLVM.
Herald added a subscriber: llvm-commits.
This is a first stab at an improved cost model for loop distribution,
replacing "always merge adjacent vectorizable partitions" with something
more fine-grained.
Two new heuristics are added. First, any adjacent partitions that have
nearby memory accesses are merged. This helps in cases where we would
otherwise separate accesses to the same buffer. (In particular, this
prevents pathologically bad behaviour on (hand-)unrolled loops.)
Second, any partition that is too small is merged with its neighbours.
This should help to keep ILP and MLP high. Currently, any partition
without load/stores is considered "too small", but I expect that this
will need some more tuning.
This seems to give reasonable results with some outliers that I need to
look at more. From the test suite:
delta exec time
benchmark #loop-dist (lower is better)
SingleSource/Benchmarks/Polybench/linear-algebra/kernels/gesummv/gesummv.test 1 0.864 (i.e. +86.4% exec time vs no loop distribute)
SingleSource/Benchmarks/Stanford/Bubblesort.test 3 0.265
SingleSource/Benchmarks/Polybench/linear-algebra/solvers/durbin/durbin.test 2 0.214
SingleSource/Benchmarks/Polybench/linear-algebra/kernels/bicg/bicg.test 2 0.144
SingleSource/Benchmarks/Misc/fp-convert.test 1 0.108
SingleSource/Benchmarks/Stanford/Treesort.test 2 0.091
SingleSource/Benchmarks/CoyoteBench/fftbench.test 1 0.081
MultiSource/Applications/hbd/hbd.test 1 0.08
MultiSource/Benchmarks/TSVC/Equivalencing-dbl/Equivalencing-dbl.test 1 0.062
SingleSource/Benchmarks/Polybench/linear-algebra/kernels/syr2k/syr2k.test 2 0.06
SingleSource/Benchmarks/Stanford/Quicksort.test 3 0.046
MultiSource/Benchmarks/MiBench/consumer-typeset/consumer-typeset.test 1 0.042
MultiSource/Benchmarks/TSVC/LinearDependence-dbl/LinearDependence-dbl.test 1 0.042
MultiSource/Benchmarks/DOE-ProxyApps-C/miniAMR/miniAMR.test 9 0.04
MultiSource/Benchmarks/MallocBench/espresso/espresso.test 4 0.031
MultiSource/Benchmarks/TSVC/Expansion-flt/Expansion-flt.test 1 0.029
MultiSource/Benchmarks/TSVC/CrossingThresholds-flt/CrossingThresholds-flt.test 1 0.023
MultiSource/Benchmarks/TSVC/CrossingThresholds-dbl/CrossingThresholds-dbl.test 1 0.022
MultiSource/Benchmarks/VersaBench/bmm/bmm.test 1 0.021
SingleSource/Benchmarks/McGill/queens.test 1 0.02
MultiSource/Benchmarks/TSVC/ControlFlow-dbl/ControlFlow-dbl.test 1 0.019
SingleSource/Benchmarks/Polybench/linear-algebra/kernels/symm/symm.test 1 0.018
MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test 2 0.017
MultiSource/Benchmarks/TSVC/NodeSplitting-dbl/NodeSplitting-dbl.test 1 0.016
SingleSource/Benchmarks/Polybench/linear-algebra/kernels/trmm/trmm.test 1 0.016
MultiSource/Benchmarks/TSVC/ControlLoops-flt/ControlLoops-flt.test 1 0.015
MultiSource/Benchmarks/TSVC/GlobalDataFlow-dbl/GlobalDataFlow-dbl.test 1 0.015
MultiSource/Benchmarks/DOE-ProxyApps-C++/HACCKernels/HACCKernels.test 2 0.013
SingleSource/Benchmarks/Polybench/linear-algebra/kernels/doitgen/doitgen.test 1 0.011
MultiSource/Benchmarks/TSVC/StatementReordering-dbl/StatementReordering-dbl.test 1 0.011
SingleSource/Benchmarks/Polybench/stencils/fdtd-apml/fdtd-apml.test 4 0.007
SingleSource/Benchmarks/Polybench/linear-algebra/kernels/syrk/syrk.test 1 0.006
MultiSource/Benchmarks/TSVC/NodeSplitting-flt/NodeSplitting-flt.test 1 0.006
MultiSource/Benchmarks/TSVC/InductionVariable-flt/InductionVariable-flt.test 1 0.005
MultiSource/Benchmarks/Bullet/bullet.test 8 0.004
MultiSource/Benchmarks/DOE-ProxyApps-C++/CLAMR/CLAMR.test 6 0.004
MultiSource/Applications/oggenc/oggenc.test 6 0.004
SingleSource/Benchmarks/Polybench/stencils/adi/adi.test 2 0.004
MultiSource/Benchmarks/TSVC/Equivalencing-flt/Equivalencing-flt.test 1 0.004
MultiSource/Benchmarks/DOE-ProxyApps-C/SimpleMOC/SimpleMOC.test 3 0.003
MultiSource/Benchmarks/TSVC/Recurrences-flt/Recurrences-flt.test 1 0.002
MultiSource/Benchmarks/ASC_Sequoia/AMGmk/AMGmk.test 1 0.002
MultiSource/Benchmarks/TSVC/IndirectAddressing-flt/IndirectAddressing-flt.test 1 0.002
MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000.test 11 0.001
MultiSource/Benchmarks/McCat/04-bisect/bisect.test 4 0.001
SingleSource/Benchmarks/Polybench/linear-algebra/solvers/dynprog/dynprog.test 2 0.001
MultiSource/Applications/SPASS/SPASS.test 1 0.001
MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a/cjpeg.test 14 0
MultiSource/Benchmarks/Prolangs-C/agrep/agrep.test 9 0
MultiSource/Benchmarks/TSVC/Searching-dbl/Searching-dbl.test 1 0
MultiSource/Benchmarks/TSVC/Searching-flt/Searching-flt.test 1 0
MultiSource/Benchmarks/7zip/7zip-benchmark.test 16 -0.001
MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test 1 -0.001
MultiSource/Benchmarks/TSVC/Reductions-dbl/Reductions-dbl.test 1 -0.001
MultiSource/Benchmarks/TSVC/Reductions-flt/Reductions-flt.test 1 -0.001
MultiSource/Benchmarks/TSVC/LoopRestructuring-flt/LoopRestructuring-flt.test 1 -0.001
MultiSource/Benchmarks/TSVC/LoopRerolling-flt/LoopRerolling-flt.test 1 -0.001
MultiSource/Applications/JM/lencod/lencod.test 6 -0.002
MultiSource/Benchmarks/TSVC/ControlFlow-flt/ControlFlow-flt.test 1 -0.002
MultiSource/Applications/viterbi/viterbi.test 1 -0.002
MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt.test 1 -0.004
MultiSource/Benchmarks/TSVC/StatementReordering-flt/StatementReordering-flt.test 1 -0.004
MultiSource/Benchmarks/TSVC/Packing-flt/Packing-flt.test 1 -0.005
MultiSource/Benchmarks/TSVC/LinearDependence-flt/LinearDependence-flt.test 1 -0.005
SingleSource/Benchmarks/Linpack/linpack-pc.test 1 -0.005
MultiSource/Benchmarks/ASC_Sequoia/CrystalMk/CrystalMk.test 3 -0.006
MultiSource/Benchmarks/TSVC/LoopRerolling-dbl/LoopRerolling-dbl.test 1 -0.006
MultiSource/Benchmarks/TSVC/GlobalDataFlow-flt/GlobalDataFlow-flt.test 1 -0.007
MultiSource/Benchmarks/VersaBench/beamformer/beamformer.test 2 -0.008
SingleSource/Benchmarks/Polybench/stencils/fdtd-2d/fdtd-2d.test 1 -0.008
MultiSource/Benchmarks/TSVC/Expansion-dbl/Expansion-dbl.test 1 -0.009
MultiSource/Applications/JM/ldecod/ldecod.test 16 -0.011
MultiSource/Benchmarks/TSVC/Recurrences-dbl/Recurrences-dbl.test 1 -0.011
MultiSource/Benchmarks/sim/sim.test 6 -0.013
MultiSource/Benchmarks/TSVC/Packing-dbl/Packing-dbl.test 1 -0.013
MultiSource/Applications/sqlite3/sqlite3.test 3 -0.014
MultiSource/Applications/ClamAV/clamscan.test 2 -0.014
MultiSource/Benchmarks/TSVC/ControlLoops-dbl/ControlLoops-dbl.test 1 -0.014
MultiSource/Benchmarks/TSVC/IndirectAddressing-dbl/IndirectAddressing-dbl.test 1 -0.019
MultiSource/Benchmarks/mafft/pairlocalalign.test 78 -0.02
SingleSource/Benchmarks/Polybench/linear-algebra/solvers/gramschmidt/gramschmidt.test 2 -0.02
MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2.test 4 -0.024
MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test 14 -0.027
MultiSource/Applications/obsequi/Obsequi.test 2 -0.027
MultiSource/Benchmarks/TSVC/Symbolics-dbl/Symbolics-dbl.test 1 -0.029
SingleSource/Benchmarks/Misc-C++/oopack_v1p8.test 2 -0.03
MultiSource/Benchmarks/TSVC/InductionVariable-dbl/InductionVariable-dbl.test 1 -0.031
MultiSource/Benchmarks/TSVC/LoopRestructuring-dbl/LoopRestructuring-dbl.test 1 -0.065
SingleSource/Benchmarks/Polybench/linear-algebra/kernels/cholesky/cholesky.test 1 -0.08
SingleSource/Benchmarks/Polybench/stencils/jacobi-2d-imper/jacobi-2d-imper.test 2 -0.082
MultiSource/Benchmarks/MiBench/telecomm-FFT/telecomm-fft.test 3 -0.125
MultiSource/Benchmarks/MallocBench/gs/gs.test 3 -0.151
SingleSource/Benchmarks/Polybench/stencils/jacobi-1d-imper/jacobi-1d-imper.test 2 -1
Repository:
rG LLVM Github Monorepo
https://reviews.llvm.org/D100381
Files:
llvm/lib/Transforms/Scalar/LoopDistribute.cpp
llvm/test/Transforms/LoopDistribute/bug-uses-outside-loop.ll
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D100381.337120.patch
Type: text/x-patch
Size: 8256 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20210413/d686c7a7/attachment.bin>
More information about the llvm-commits
mailing list