[PATCH] D155246: [SLP]Improve stores vectorization.

Alexey Bataev via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Jul 13 16:08:59 PDT 2023


ABataev created this revision.
ABataev added reviewers: RKSimon, vdmitrie.
Herald added subscribers: vporpo, hiraditya.
Herald added a project: All.
ABataev requested review of this revision.
Herald added a subscriber: wangpc.
Herald added a project: LLVM.

Use O(nlogn) instead of O(N2) (N <= 32) sorting approach and do not try
to revectorize all possible combinations of stores, if they
definitely cannot be combined because of mem/data dependencies.
Compile time (O3 <https://reviews.llvm.org/owners/package/3/> + lto, skylake_avx512):
External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk.test 117.15       120.11     2.5%
External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test 203.67       207.42     1.8%
External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 232.43       235.01     1.1%
External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test 205.49       207.25     0.9%
External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 310.46       306.23    -1.4%

Link time (O3 <https://reviews.llvm.org/owners/package/3/>+lto, skylake_avx512):
External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 1383.69   1475.94    6.7%

Other changes are too small, cannot rely on them.

size..text
Program                                                                                                           size..text

                                                                           results     results0    diff
        test-suite :: SingleSource/Regression/C/Regression-C-sumarray.test      392.00     1439.00 267.1%
              test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test   394258.00   394818.00   0.1%
              test-suite :: MultiSource/Applications/JM/lencod/lencod.test   846355.00   847075.00   0.1%
         test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test   782816.00   783360.00   0.1%
        test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test   779667.00   779923.00   0.0%
            test-suite :: MultiSource/Benchmarks/mafft/pairlocalalign.test   224398.00   224446.00   0.0%
                 test-suite :: MultiSource/Applications/oggenc/oggenc.test   185019.00   185035.00   0.0%
  test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12487610.00 12488010.00   0.0%
             test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test  1051772.00  1051804.00   0.0%
                   test-suite :: MultiSource/Applications/SPASS/SPASS.test   529586.00   529602.00   0.0%
     test-suite :: External/SPEC/CINT2006/400.perlbench/400.perlbench.test  1084684.00  1084716.00   0.0%
           test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test  1014245.00  1014261.00   0.0%
  
   test-suite :: MultiSource/Benchmarks/MallocBench/espresso/espresso.test   223494.00   223478.00  -0.0%
      test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test   660843.00   660795.00  -0.0%
       test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test   660843.00   660795.00  -0.0%
               test-suite :: MultiSource/Applications/ClamAV/clamscan.test   568824.00   568760.00  -0.0%

espresso - 2 more stores vectorized
x264 - small number of changes in 3-4 functions, generated a bit more
vector stores (2 4x zeroinitializer stores + some other small variations).
clamscan - emitted 32xi8 store instead of several scalar stores + several 4x-8x stores.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D155246

Files:
  llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
  llvm/test/Transforms/SLPVectorizer/X86/many_stores.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D155246.540209.patch
Type: text/x-patch
Size: 13152 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20230713/42951a37/attachment.bin>


More information about the llvm-commits mailing list