[all-commits] [llvm/llvm-project] ef3888: [X86 isel] Remove lane requirement from lowerShuff...

Han Zhu via All-commits all-commits at lists.llvm.org
Wed Apr 12 17:11:34 PDT 2023


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: ef38880ce03bc1f1fb3606c5a629151f3d0e975e
      https://github.com/llvm/llvm-project/commit/ef38880ce03bc1f1fb3606c5a629151f3d0e975e
  Author: Han Zhu <zhuhan7737 at gmail.com>
  Date:   2023-04-12 (Wed, 12 Apr 2023)

  Changed paths:
    M llvm/lib/Target/X86/X86ISelLowering.cpp
    M llvm/test/CodeGen/X86/oddshuffles.ll
    M llvm/test/CodeGen/X86/pr61964.ll
    M llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-8.ll
    M llvm/test/CodeGen/X86/vector-interleaved-store-i32-stride-5.ll
    M llvm/test/CodeGen/X86/vector-interleaved-store-i32-stride-8.ll
    M llvm/test/CodeGen/X86/vector-interleaved-store-i8-stride-5.ll
    M llvm/test/CodeGen/X86/vector-interleaved-store-i8-stride-7.ll
    M llvm/test/CodeGen/X86/vector-shuffle-256-v16.ll
    M llvm/test/CodeGen/X86/vector-shuffle-256-v8.ll

  Log Message:
  -----------
  [X86 isel] Remove lane requirement from lowerShuffleAsUNPCKAndPermute

`lowerShuffleAsUNPCKAndPermute` requires the shuffle mask element to be
in the same lane in both the input and output vectors. This prevents it from
matching certain patterns for example in [GHI
61964](https://github.com/llvm/llvm-project/issues/61964). Removing the lane
requirement fixes the issue.

The change I'm targeting is in the test llvm/test/CodeGen/X86/pr61964.ll. The
codegen has improved notably with this patch. Otherwise, looks like some
broadcast instructions are replaced with unpck and perm. To check if there's
any other performance change, I ran llvm-test-suite benchmarks from the
SingleSource, MultiSource, and MicroBenchmarks directories:
```
Tests: 2665
Short Running: 2009 (filtered out)
Same hash: 140 (filtered out)
In Blacklist: 513 (filtered out)
Remaining: 3
Metric: exec_time

Program                                                                exec_time
                                                                       lhs       rhs    diff
 test-suite :: MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000.test   1.64      1.64  0.1%
      test-suite :: SingleSource/Benchmarks/Adobe-C++/loop_unroll.test   1.06      1.06  0.0%
          test-suite :: MultiSource/Applications/JM/lencod/lencod.test   5.25      5.25  0.0%
                                                    Geomean difference    nan       nan  0.0%
      exec_time
l/r         lhs       rhs      diff
count  3.000000  3.000000  3.000000
mean   2.648300  2.649100  0.000462
std    2.269035  2.268849  0.000415
min    1.055500  1.055900  0.000095
25%    1.349300  1.350250  0.000237
50%    1.643100  1.644600  0.000379
75%    3.444700  3.445700  0.000646
max    5.246300  5.246800  0.000913
```
The patch only hits three cases and the result is neutral. (The 513 blacklisted
benchmarks are the ones under MicroBenchmarks, which `--filter-hash` does
not work and I manually verified their code did not change).

Differential Revision: https://reviews.llvm.org/D147668




More information about the All-commits mailing list