[all-commits] [llvm/llvm-project] 32994c: [SLP]Improve findReusedOrderedScalars and graph ro...

Thu Feb 22 11:32:27 PST 2024

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 32994cc0d63513f77223c64148faeeb50aebb702
      https://github.com/llvm/llvm-project/commit/32994cc0d63513f77223c64148faeeb50aebb702
  Author: Alexey Bataev <5361294+alexey-bataev at users.noreply.github.com>
  Date:   2024-02-22 (Thu, 22 Feb 2024)

  Changed paths:
    M llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
    M llvm/test/Transforms/SLPVectorizer/AArch64/extractelements-to-shuffle.ll
    M llvm/test/Transforms/SLPVectorizer/AArch64/reorder-fmuladd-crash.ll
    M llvm/test/Transforms/SLPVectorizer/AArch64/tsc-s116.ll
    M llvm/test/Transforms/SLPVectorizer/AArch64/vec3-reorder-reshuffle.ll
    M llvm/test/Transforms/SLPVectorizer/X86/pr35497.ll
    M llvm/test/Transforms/SLPVectorizer/X86/reduction-transpose.ll
    M llvm/test/Transforms/SLPVectorizer/X86/reorder-clustered-node.ll
    M llvm/test/Transforms/SLPVectorizer/X86/reorder-reused-masked-gather.ll
    M llvm/test/Transforms/SLPVectorizer/X86/reorder-vf-to-resize.ll
    M llvm/test/Transforms/SLPVectorizer/X86/scatter-vectorize-reorder.ll
    M llvm/test/Transforms/SLPVectorizer/X86/shrink_after_reorder2.ll
    M llvm/test/Transforms/SLPVectorizer/X86/vec3-reorder-reshuffle.ll

  Log Message:
  -----------
  [SLP]Improve findReusedOrderedScalars and graph rotation.

Patch syncs the code in findReusedOrderedScalars with cost
estimation/codegen. It tries to use similar logic to better determine
best order.
Before, it just tried to find previously vectorized node without
checking if it is possible to use the vectorized value in the shuffle.
Now it relies on the more generalized version. If it determines, that
a single vector must be reordered (using same mechanism, as codegen and
cost estimation), it generates better order.

The comparison between new/ref ordering:

Metric: SLP.NumVectorInstructions

Program                                                                                                                                                SLP.NumVectorInstructions
                                                                                                                                                       results                   results0 diff
                                                                                               test-suite :: MultiSource/Benchmarks/nbench/nbench.test   139.00                    140.00   0.7%
                                                                             test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test   344.00                    346.00   0.6%
                                                                                        test-suite :: MultiSource/Benchmarks/FreeBench/pifft/pifft.test  1293.00                   1292.00  -0.1%
                                                                                test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test  5176.00                   5170.00  -0.1%
                                                                                        test-suite :: External/SPEC/CFP2006/453.povray/453.povray.test  5173.00                   5167.00  -0.1%
                                                                                test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 11692.00                  11660.00  -0.3%
                                                                                     test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test  1621.00                   1615.00  -0.4%
                                                                                             test-suite :: External/SPEC/CINT2006/403.gcc/403.gcc.test   795.00                    792.00  -0.4%
                                                                              test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 26499.00                  26338.00  -0.6%
                                                                                               test-suite :: MultiSource/Benchmarks/Bullet/bullet.test  7343.00                   7281.00  -0.8%
                                                                                          test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test  1104.00                   1094.00  -0.9%
                                                                                          test-suite :: MultiSource/Applications/JM/lencod/lencod.test  2216.00                   2180.00  -1.6%
                                                                                            test-suite :: External/SPEC/CFP2006/433.milc/433.milc.test   787.00                    637.00 -19.1%

Less 0% is better.
Most of the benchmarks see more vectorized code. The first ones just
have shuffles removed.

The ordering analysis still may require some improvements (e.g. for
alternate nodes), but this one should be produce better results.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/77529

To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications