[all-commits] [llvm/llvm-project] ef3888: [X86 isel] Remove lane requirement from lowerShuff...
Han Zhu via All-commits
all-commits at lists.llvm.org
Wed Apr 12 17:11:34 PDT 2023
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: ef38880ce03bc1f1fb3606c5a629151f3d0e975e
https://github.com/llvm/llvm-project/commit/ef38880ce03bc1f1fb3606c5a629151f3d0e975e
Author: Han Zhu <zhuhan7737 at gmail.com>
Date: 2023-04-12 (Wed, 12 Apr 2023)
Changed paths:
M llvm/lib/Target/X86/X86ISelLowering.cpp
M llvm/test/CodeGen/X86/oddshuffles.ll
M llvm/test/CodeGen/X86/pr61964.ll
M llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-8.ll
M llvm/test/CodeGen/X86/vector-interleaved-store-i32-stride-5.ll
M llvm/test/CodeGen/X86/vector-interleaved-store-i32-stride-8.ll
M llvm/test/CodeGen/X86/vector-interleaved-store-i8-stride-5.ll
M llvm/test/CodeGen/X86/vector-interleaved-store-i8-stride-7.ll
M llvm/test/CodeGen/X86/vector-shuffle-256-v16.ll
M llvm/test/CodeGen/X86/vector-shuffle-256-v8.ll
Log Message:
-----------
[X86 isel] Remove lane requirement from lowerShuffleAsUNPCKAndPermute
`lowerShuffleAsUNPCKAndPermute` requires the shuffle mask element to be
in the same lane in both the input and output vectors. This prevents it from
matching certain patterns for example in [GHI
61964](https://github.com/llvm/llvm-project/issues/61964). Removing the lane
requirement fixes the issue.
The change I'm targeting is in the test llvm/test/CodeGen/X86/pr61964.ll. The
codegen has improved notably with this patch. Otherwise, looks like some
broadcast instructions are replaced with unpck and perm. To check if there's
any other performance change, I ran llvm-test-suite benchmarks from the
SingleSource, MultiSource, and MicroBenchmarks directories:
```
Tests: 2665
Short Running: 2009 (filtered out)
Same hash: 140 (filtered out)
In Blacklist: 513 (filtered out)
Remaining: 3
Metric: exec_time
Program exec_time
lhs rhs diff
test-suite :: MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000.test 1.64 1.64 0.1%
test-suite :: SingleSource/Benchmarks/Adobe-C++/loop_unroll.test 1.06 1.06 0.0%
test-suite :: MultiSource/Applications/JM/lencod/lencod.test 5.25 5.25 0.0%
Geomean difference nan nan 0.0%
exec_time
l/r lhs rhs diff
count 3.000000 3.000000 3.000000
mean 2.648300 2.649100 0.000462
std 2.269035 2.268849 0.000415
min 1.055500 1.055900 0.000095
25% 1.349300 1.350250 0.000237
50% 1.643100 1.644600 0.000379
75% 3.444700 3.445700 0.000646
max 5.246300 5.246800 0.000913
```
The patch only hits three cases and the result is neutral. (The 513 blacklisted
benchmarks are the ones under MicroBenchmarks, which `--filter-hash` does
not work and I manually verified their code did not change).
Differential Revision: https://reviews.llvm.org/D147668
More information about the All-commits
mailing list