[PATCH] D116343: [SLP]Introduce split shuffle vectorization mode.
Alexey Bataev via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Dec 28 13:37:55 PST 2021
ABataev created this revision.
ABataev added reviewers: RKSimon, anton-afanasyev, dtemirbulatov.
Herald added subscribers: dmgreen, hiraditya.
ABataev requested review of this revision.
Herald added a project: LLVM.
Introduced split shuffle node kind. If the node is supposed to be
gathered, the compiler tries to detect possibly vectorizable number of
scalar instruction in the whole node and if there are > number / 2 of
the compatible instructions, the whole node is splitted into 2 (number / 2)
subnodes, vectorized separately and then reshuffled into the resulting
vector.
Metric: SLP.NumVectorInstructions
Program results results0 diff
test-suite :: MultiSource/Benchmarks/MallocBench/espresso/espresso.test 3.00 7.00 133.3%
test-suite :: MultiSource/Benchmarks/Prolangs-C/assembler/assembler.test 3.00 6.00 100.0%
test-suite :: MultiSource/Benchmarks/FreeBench/pifft/pifft.test 1202.00 1627.00 35.4%
test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test 149.00 201.00 34.9%
test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test 149.00 201.00 34.9%
test-suite :: MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2decode.test 37.00 46.00 24.3%
test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test 256.00 282.00 10.2%
test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 1624.00 1781.00 9.7%
test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 1624.00 1781.00 9.7%
test-suite :: External/SPEC/CINT2017speed/657.xz_s/657.xz_s.test 115.00 126.00 9.6%
test-suite :: External/SPEC/CINT2017rate/557.xz_r/557.xz_r.test 115.00 126.00 9.6%
test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 8737.00 9558.00 9.4%
test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 1051.00 1143.00 8.8%
test-suite :: MultiSource/Benchmarks/VersaBench/dbms/dbms.test 25.00 27.00 8.0%
test-suite :: SingleSource/Benchmarks/Misc/ReedSolomon.test 14.00 15.00 7.1%
test-suite :: MultiSource/Applications/sqlite3/sqlite3.test 30.00 32.00 6.7%
test-suite :: MultiSource/Applications/JM/lencod/lencod.test 1671.00 1772.00 6.0%
test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 3565.00 3773.00 5.8%
test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 3565.00 3773.00 5.8%
test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 4527.00 4772.00 5.4%
test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 4240.00 4464.00 5.3%
test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test 1996.00 2100.00 5.2%
test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 421.00 440.00 4.5%
test-suite :: MultiSource/Benchmarks/MallocBench/gs/gs.test 299.00 311.00 4.0%
test-suite :: MultiSource/Benchmarks/McCat/18-imp/imp.test 35.00 36.00 2.9%
test-suite :: MultiSource/Applications/SIBsim4/SIBsim4.test 39.00 40.00 2.6%
test-suite :: MultiSource/Applications/hbd/hbd.test 41.00 42.00 2.4%
test-suite :: MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a/cjpeg.test 260.00 265.00 1.9%
test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 9100.00 9220.00 1.3%
test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test 4573.00 4598.00 0.5%
test-suite :: MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000.test 709.00 711.00 0.3%
test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test 5623.00 5638.00 0.3%
test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test 5623.00 5638.00 0.3%
test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test 383.00 384.00 0.3%
test-suite :: MultiSource/Benchmarks/MiBench/consumer-typeset/consumer-typeset.test 624.00 623.00 -0.2%
test-suite :: MultiSource/Benchmarks/mafft/pairlocalalign.test 571.00 570.00 -0.2%
test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test 854.00 851.00 -0.4%
test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test 854.00 851.00 -0.4%
test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG/miniGMG.test 65.00 64.00 -1.5%
test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test 783.00 767.00 -2.0%
test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test 235.00 230.00 -2.1%
test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test 235.00 230.00 -2.1%
test-suite :: MultiSource/Benchmarks/Prolangs-C/football/football.test 38.00 35.00 -7.9%
test-suite :: External/SPEC/CINT2017speed/641.leela_s/641.leela_s.test 207.00 188.00 -9.2%
test-suite :: External/SPEC/CINT2017rate/541.leela_r/541.leela_r.test 207.00 188.00 -9.2%
test-suite :: MultiSource/Benchmarks/Ptrdist/anagram/anagram.test 20.00 18.00 -10.0%
test-suite :: SingleSource/Benchmarks/Polybench/stencils/fdtd-2d/fdtd-2d.test 37.00 31.00 -16.2%
test-suite :: MultiSource/Benchmarks/MiBench/security-sha/security-sha.test 13.00 10.00 -23.1%
test-suite :: SingleSource/Benchmarks/Polybench/medley/floyd-warshall/floyd-warshall.test 8.00 6.00 -25.0%
test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/syrk/syrk.test 8.00 6.00 -25.0%
test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/cholesky/cholesky.test 8.00 6.00 -25.0%
test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/gemm/gemm.test 8.00 6.00 -25.0%
test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/syr2k/syr2k.test 8.00 6.00 -25.0%
test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/doitgen/doitgen.test 8.00 6.00 -25.0%
test-suite :: SingleSource/Benchmarks/Polybench/stencils/jacobi-2d-imper/jacobi-2d-imper.test 8.00 6.00 -25.0%
test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/solvers/lu/lu.test 8.00 6.00 -25.0%
test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/mvt/mvt.test 16.00 12.00 -25.0%
test-suite :: SingleSource/Benchmarks/Polybench/stencils/jacobi-1d-imper/jacobi-1d-imper.test 8.00 6.00 -25.0%
test-suite :: SingleSource/Benchmarks/Polybench/stencils/seidel-2d/seidel-2d.test 8.00 6.00 -25.0%
test-suite :: MultiSource/Applications/aha/aha.test NaN 4.00 NaN
MultiSource/Benchmarks/MiBench/consumer-typeset/consumer-typeset.test
- actually, more vector instruction, less shuffles.
MultiSource/Benchmarks/mafft/pairlocalalign.test - less shuffles
External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test - more vectorized code,
less shuffles
External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test - same
MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG/miniGMG.test - more
vectorized code, some inserts/shuffles were optimized.
MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test - more
vectorized code but some previously vectorized not vectorized anymore,
need D116312 <https://reviews.llvm.org/D116312>.
MultiSource/Benchmarks/mediabench/gsm/toast/toast.test - less shuffles,
more vector code.
MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test - same
MultiSource/Benchmarks/Prolangs-C/football/football.test - need
non-power-2 to improve more, but pretty the same.
External/SPEC/CINT2017speed/641.leela_s/641.leela_s.test - more vector instruction, less shuffles.
External/SPEC/CINT2017rate/541.leela_r/541.leela_r.test - more vector instruction, less shuffles.
MultiSource/Benchmarks/Ptrdist/anagram/anagram.test - less shuffles
All other - same
Repository:
rG LLVM Github Monorepo
https://reviews.llvm.org/D116343
Files:
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
llvm/test/Transforms/SLPVectorizer/SystemZ/pr34619.ll
llvm/test/Transforms/SLPVectorizer/X86/alternate-cast-inseltpoison.ll
llvm/test/Transforms/SLPVectorizer/X86/alternate-cast.ll
llvm/test/Transforms/SLPVectorizer/X86/crash_reordering_undefs.ll
llvm/test/Transforms/SLPVectorizer/X86/gather-move-out-of-loop.ll
llvm/test/Transforms/SLPVectorizer/X86/insert-shuffle.ll
llvm/test/Transforms/SLPVectorizer/X86/memory-runtime-checks.ll
llvm/test/Transforms/SLPVectorizer/X86/no_alternate_divrem.ll
llvm/test/Transforms/SLPVectorizer/X86/phi.ll
llvm/test/Transforms/SLPVectorizer/X86/pr47629-inseltpoison.ll
llvm/test/Transforms/SLPVectorizer/X86/pr47629.ll
llvm/test/Transforms/SLPVectorizer/X86/pr47642.ll
llvm/test/Transforms/SLPVectorizer/X86/reduction-logical.ll
llvm/test/Transforms/SLPVectorizer/X86/vec_list_bias-inseltpoison.ll
llvm/test/Transforms/SLPVectorizer/X86/vec_list_bias.ll
llvm/test/Transforms/SLPVectorizer/X86/vect_copyable_in_binops.ll
llvm/test/Transforms/SLPVectorizer/X86/vectorize-widest-phis.ll
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D116343.396429.patch
Type: text/x-patch
Size: 135239 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20211228/b0f729bd/attachment-0001.bin>
More information about the llvm-commits
mailing list