[PATCH] D116688: [SLP]Excluded external uses from the reprdering estimation.

Wed Jan 5 12:16:58 PST 2022

ABataev created this revision.
ABataev added reviewers: vporpo, RKSimon, anton-afanasyev, dtemirbulatov.
Herald added subscribers: dmgreen, hiraditya.
ABataev requested review of this revision.
Herald added a project: LLVM.

Compiler adds the estimation for the external uses during operands
reordering analysis, which makes it tend to prefer duplicates in the
lanes rather than diamond/shuffled match in the graph. It changes the sizes of
the vector operands and may prevent some vectorization. We don't need
this kind of estimation for the analysis phase, because we just need to
choose the most compatible instruction and it does not matter if it has
external user or used in the non-matching lane. Instead, we count the number
of unique instruction in the lane and see if the reassociation changes
the number of unique scalars to be power of 2 or not. If we have power
of 2 unique scalars in the lane, it is considered more profitable rather
than having non-power-of-2 number of unique scalars.

Metric: SLP.NumVectorInstructions

Program                                                                         results results0 diff

          test-suite :: MultiSource/Benchmarks/FreeBench/distray/distray.test   70.00   86.00   22.9%
                      test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 4527.00 4630.00    2.3%
             test-suite :: External/SPEC/CFP2017rate/544.nab_r/544.nab_r.test  346.00  353.00    2.0%
            test-suite :: External/SPEC/CFP2017speed/644.nab_s/644.nab_s.test  346.00  353.00    2.0%
       test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 9100.00 9275.00    1.9%
  test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test  235.00  239.00    1.7%
         test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test  235.00  239.00    1.7%
     test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 8737.00 8859.00    1.4%
                 test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 1051.00 1064.00    1.2%
          test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 1628.00 1646.00    1.1%
         test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 1628.00 1646.00    1.1%
    test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 3565.00 3577.00    0.3%
     test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 3565.00 3577.00    0.3%
       test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 4240.00 4250.00    0.2%
              test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test 1996.00 1998.00    0.1%
                 test-suite :: MultiSource/Applications/JM/lencod/lencod.test 1671.00 1672.00    0.1%

test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test  783.00  782.00   -0.1%

                test-suite :: SingleSource/Benchmarks/Misc/oourafft.test   69.00   68.00   -1.4%
  test-suite :: External/SPEC/CINT2017speed/641.leela_s/641.leela_s.test  207.00  192.00   -7.2%
   test-suite :: External/SPEC/CINT2017rate/541.leela_r/541.leela_r.test  207.00  192.00   -7.2%

test-suite :: External/SPEC/CINT2017rate/531.deepsjeng_r/531.deepsjeng_r.test   89.00   80.00  -10.1%
test-suite :: External/SPEC/CINT2017speed/631.deepsjeng_s/631.deepsjeng_s.test   89.00   80.00  -10.1%

  test-suite :: MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a/cjpeg.test  260.00  215.00  -17.3%

test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test  256.00  211.00  -17.6%

MultiSource/Benchmarks/Prolangs-C/TimberWolfMC - pretty the same.
SingleSource/Benchmarks/Misc/oourafft.test - 2 <2 x > loads replaced by
one <4 x> load.
External/SPEC/CINT2017speed/641.leela_s - function gets vectorized and
not inlined anymore.
External/SPEC/CINT2017rate/541.leela_r - same
xternal/SPEC/CINT2017rate/531.deepsjeng_r - changed the order in
multi-block tree, the result is pretty the same.
External/SPEC/CINT2017speed/631.deepsjeng_s - same.
MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a - the result is the same
as before.
MultiSource/Benchmarks/MiBench/consumer-jpeg - same.

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D116688

Files:
  llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
  llvm/test/Transforms/SLPVectorizer/AArch64/transpose-inseltpoison.ll
  llvm/test/Transforms/SLPVectorizer/AArch64/transpose.ll
  llvm/test/Transforms/SLPVectorizer/X86/crash_exceed_scheduling.ll
  llvm/test/Transforms/SLPVectorizer/X86/insert-shuffle.ll
  llvm/test/Transforms/SLPVectorizer/X86/matched-shuffled-entries.ll
  llvm/test/Transforms/SLPVectorizer/X86/operandorder.ll
  llvm/test/Transforms/SLPVectorizer/X86/vec_list_bias-inseltpoison.ll
  llvm/test/Transforms/SLPVectorizer/X86/vec_list_bias.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D116688.397673.patch
Type: text/x-patch
Size: 36655 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20220105/5bf13158/attachment.bin>