[PATCH] D131967: [RISCV] Correct costs for vector ceil/floor/trunc/round
Craig Topper via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Aug 17 10:47:39 PDT 2022
craig.topper added inline comments.
================
Comment at: llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp:262
+ {Intrinsic::floor, MVT::v16f32, 16},
+ {Intrinsic::floor, MVT::nxv2f32, 15},
+ {Intrinsic::floor, MVT::nxv4f32, 16},
----------------
craig.topper wrote:
> Why is nxv2f32 cheaper than nxv4f32?
Ok it's bcecause LMUL>1 generates
```
vmflt.vv v11, v12, v8, v0.t
vmv1r.v v0, v11
```
due to an earlyclobber needed for the narrowing overlap rules.
LMUL <=1 doesn't have the earlyclobber because the overlap would always be "in the lowest-numbered part of the source register group". So. it generates
```
vmflt.vv v0, v10, v8, v0.t
```
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D131967/new/
https://reviews.llvm.org/D131967
More information about the llvm-commits
mailing list