[all-commits] [llvm/llvm-project] e894c3: [SLP]Improve stores vectorization.
Alexey Bataev via All-commits
all-commits at lists.llvm.org
Mon Aug 7 09:34:46 PDT 2023
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: e894c3d1a9ac50f5e91a7ab9e28cab74b6e349f2
https://github.com/llvm/llvm-project/commit/e894c3d1a9ac50f5e91a7ab9e28cab74b6e349f2
Author: Alexey Bataev <a.bataev at outlook.com>
Date: 2023-08-07 (Mon, 07 Aug 2023)
Changed paths:
M llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
M llvm/test/Transforms/SLPVectorizer/X86/many_stores.ll
M llvm/test/Transforms/SLPVectorizer/X86/stores-non-ordered.ll
Log Message:
-----------
[SLP]Improve stores vectorization.
Use O(nlogn) instead of O(N2) (N <= 32) sorting approach and do not try
to revectorize all possible combinations of stores, if they
definitely cannot be combined because of mem/data dependencies.
Compile time (O3 + lto, skylake_avx512):
External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk.test 117.15 120.11 2.5%
External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test 203.67 207.42 1.8%
External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 232.43 235.01 1.1%
External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test 205.49 207.25 0.9%
External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 310.46 306.23 -1.4%
Link time (O3+lto, skylake_avx512):
External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 1383.69 1475.94 6.7%
Other changes are too small, cannot rely on them.
size..text
Program size..text
results results0 diff
test-suite :: SingleSource/Regression/C/Regression-C-sumarray.test 392.00 1439.00 267.1%
test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 394258.00 394818.00 0.1%
test-suite :: MultiSource/Applications/JM/lencod/lencod.test 846355.00 847075.00 0.1%
test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test 782816.00 783360.00 0.1%
test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test 779667.00 779923.00 0.0%
test-suite :: MultiSource/Benchmarks/mafft/pairlocalalign.test 224398.00 224446.00 0.0%
test-suite :: MultiSource/Applications/oggenc/oggenc.test 185019.00 185035.00 0.0%
test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12487610.00 12488010.00 0.0%
test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 1051772.00 1051804.00 0.0%
test-suite :: MultiSource/Applications/SPASS/SPASS.test 529586.00 529602.00 0.0%
test-suite :: External/SPEC/CINT2006/400.perlbench/400.perlbench.test 1084684.00 1084716.00 0.0%
test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test 1014245.00 1014261.00 0.0%
test-suite :: MultiSource/Benchmarks/MallocBench/espresso/espresso.test 223494.00 223478.00 -0.0%
test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 660843.00 660795.00 -0.0%
test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 660843.00 660795.00 -0.0%
test-suite :: MultiSource/Applications/ClamAV/clamscan.test 568824.00 568760.00 -0.0%
espresso - 2 more stores vectorized
x264 - small number of changes in 3-4 functions, generated a bit more
vector stores (2 4x zeroinitializer stores + some other small variations).
clamscan - emitted 32xi8 store instead of several scalar stores + several 4x-8x stores.
Differential Revision: https://reviews.llvm.org/D155246
More information about the All-commits
mailing list