[all-commits] [llvm/llvm-project] 7d8060: [SLP]Improve reductions vectorization.

Andrew V. Tischenko via All-commits all-commits at lists.llvm.org
Wed May 18 13:24:31 PDT 2022


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 7d8060bc19e9b03c93d825a5a790e3c1f4978c52
      https://github.com/llvm/llvm-project/commit/7d8060bc19e9b03c93d825a5a790e3c1f4978c52
  Author: Alexey Bataev <a.bataev at outlook.com>
  Date:   2022-05-18 (Wed, 18 May 2022)

  Changed paths:
    M llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
    M llvm/test/Transforms/SLPVectorizer/AMDGPU/horizontal-store.ll
    M llvm/test/Transforms/SLPVectorizer/X86/PR35628_1.ll
    M llvm/test/Transforms/SLPVectorizer/X86/PR35628_2.ll
    M llvm/test/Transforms/SLPVectorizer/X86/PR39774.ll
    M llvm/test/Transforms/SLPVectorizer/X86/bool-mask.ll
    M llvm/test/Transforms/SLPVectorizer/X86/crash_reordering_undefs.ll
    M llvm/test/Transforms/SLPVectorizer/X86/horizontal-list.ll
    M llvm/test/Transforms/SLPVectorizer/X86/horizontal-minmax.ll
    M llvm/test/Transforms/SLPVectorizer/X86/malformed_phis.ll
    M llvm/test/Transforms/SLPVectorizer/X86/reduction-logical.ll
    M llvm/test/Transforms/SLPVectorizer/X86/revectorized_rdx_crash.ll
    M llvm/test/Transforms/SLPVectorizer/X86/undef_vect.ll
    M llvm/test/Transforms/SLPVectorizer/X86/used-reduced-op.ll

  Log Message:
  -----------
  [SLP]Improve reductions vectorization.

The pattern matching and vectgorization for reductions was not very
effective. Some of of the possible reduction values were marked as
external arguments, SLP could not find some reduction patterns because
of too early attempt to vectorize pair of binops arguments, the cost of
consts reductions was not correct. Patch addresses these issues and
improves the analysis/cost estimation and vectorization of the
reductions.

The most significant changes in SLP.NumVectorInstructions:

Metric: SLP.NumVectorInstructions                                                                                                                                                                                                 [140/14396]

Program                                                                                        results  results0 diff
               test-suite :: SingleSource/Benchmarks/Adobe-C++/loop_unroll.test   920.00  3548.00 285.7%
                test-suite :: SingleSource/Benchmarks/BenchmarkGame/n-body.test    66.00   122.00  84.8%
      test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG/miniGMG.test   100.00   128.00  28.0%
 test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test   664.00   810.00  22.0%
                 test-suite :: MultiSource/Benchmarks/mafft/pairlocalalign.test   592.00   687.00  16.0%
  test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test   402.00   426.00   6.0%
                   test-suite :: MultiSource/Applications/JM/lencod/lencod.test  1665.00  1745.00   4.8%
  test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test   135.00   139.00   3.0%
 test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test   135.00   139.00   3.0%
                  test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test   388.00   397.00   2.3%
                   test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test   895.00   914.00   2.1%
    test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test   240.00   244.00   1.7%
           test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test   240.00   244.00   1.7%
             test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test   820.00   832.00   1.5%
              test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test   820.00   832.00   1.5%
       test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 14804.00 14914.00   0.7%
                        test-suite :: MultiSource/Benchmarks/Bullet/bullet.test  8125.00  8183.00   0.7%
           test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test  1330.00  1338.00   0.6%
            test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test  1330.00  1338.00   0.6%
         test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test  9832.00  9880.00   0.5%
         test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test  5267.00  5291.00   0.5%
       test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  4018.00  4024.00   0.1%
      test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  4018.00  4024.00   0.1%
              test-suite :: External/SPEC/CFP2017speed/644.nab_s/644.nab_s.test   426.00   424.00  -0.5%
               test-suite :: External/SPEC/CFP2017rate/544.nab_r/544.nab_r.test   426.00   424.00  -0.5%
          test-suite :: External/SPEC/CINT2017rate/541.leela_r/541.leela_r.test   201.00   192.00  -4.5%
         test-suite :: External/SPEC/CINT2017speed/641.leela_s/641.leela_s.test   201.00   192.00  -4.5%

644.nab_s and 544.nab_r - reduced number of shuffles but increased number
of useful vectorized instructions.

641.leela_s and 541.leela_r - the function
`@_ZN9FastBoard25get_pattern3_augment_specEiib` is not inlined anymore
but its body gets vectorized successfully. Before, the function was
inlined twice and vectorized just after inlining, currently it is not
required. The vector code looks pretty similar, just like as it was before.

Differential Revision: https://reviews.llvm.org/D111574




More information about the All-commits mailing list