[all-commits] [llvm/llvm-project] d62d15: [RISCV] Undo unprofitable zext of icmp combine (#1...
Luke Lau via All-commits
all-commits at lists.llvm.org
Fri Apr 4 11:06:20 PDT 2025
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: d62d15e298ce323cb933f4949b42fe46dcb01b77
https://github.com/llvm/llvm-project/commit/d62d15e298ce323cb933f4949b42fe46dcb01b77
Author: Luke Lau <luke at igalia.com>
Date: 2025-04-04 (Fri, 04 Apr 2025)
Changed paths:
M llvm/lib/Target/RISCV/RISCVISelLowering.cpp
A llvm/test/CodeGen/RISCV/rvv/zext-icmp.ll
Log Message:
-----------
[RISCV] Undo unprofitable zext of icmp combine (#134306)
InstCombine will combine this zext of an icmp where the source has a
single bit set to a lshr plus trunc
(`InstCombinerImpl::transformZExtICmp`):
```llvm
define <vscale x 1 x i8> @f(<vscale x 1 x i64> %x) {
%1 = and <vscale x 1 x i64> %x, splat (i64 8)
%2 = icmp ne <vscale x 1 x i64> %1, splat (i64 0)
%3 = zext <vscale x 1 x i1> %2 to <vscale x 1 x i8>
ret <vscale x 1 x i8> %3
}
```
```llvm
define <vscale x 1 x i8> @reverse_zexticmp_i64(<vscale x 1 x i64> %x) {
%1 = trunc <vscale x 1 x i64> %x to <vscale x 1 x i8>
%2 = lshr <vscale x 1 x i8> %1, splat (i8 2)
%3 = and <vscale x 1 x i8> %2, splat (i8 1)
ret <vscale x 1 x i8> %3
}
```
In a loop, this ends up being unprofitable for RISC-V because the
codegen now goes from:
```asm
f: # @f
.cfi_startproc
# %bb.0:
vsetvli a0, zero, e64, m1, ta, ma
vand.vi v8, v8, 8
vmsne.vi v0, v8, 0
vsetvli zero, zero, e8, mf8, ta, ma
vmv.v.i v8, 0
vmerge.vim v8, v8, 1, v0
ret
```
To a series of narrowing vnsrl.wis:
```asm
f: # @f
.cfi_startproc
# %bb.0:
vsetvli a0, zero, e64, m1, ta, ma
vand.vi v8, v8, 8
vsetvli zero, zero, e32, mf2, ta, ma
vnsrl.wi v8, v8, 3
vsetvli zero, zero, e16, mf4, ta, ma
vnsrl.wi v8, v8, 0
vsetvli zero, zero, e8, mf8, ta, ma
vnsrl.wi v8, v8, 0
ret
```
In the original form, the vmv.v.i is loop invariant and is hoisted out,
and the vmerge.vim usually gets folded away into a masked instruction,
so you usually just end up with a vsetvli + vmsne.vi.
The truncate requires multiple instructions and introduces a vtype
toggle for each one, and is measurably slower on the BPI-F3.
This reverses the transform in RISCVISelLowering for truncations greater
than twice the bitwidth, i.e. it keeps single vnsrl.wis.
Fixes #132245
To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications
More information about the All-commits
mailing list