[PATCH] D131967: [RISCV] Correct costs for vector ceil/floor/trunc/round

Wed Aug 17 10:47:39 PDT 2022

craig.topper added inline comments.

================
Comment at: llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp:262
+   {Intrinsic::floor, MVT::v16f32, 16},
+   {Intrinsic::floor, MVT::nxv2f32, 15},
+   {Intrinsic::floor, MVT::nxv4f32, 16},
----------------
craig.topper wrote:
> Why is nxv2f32 cheaper than nxv4f32?
Ok it's bcecause LMUL>1 generates

```
vmflt.vv        v11, v12, v8, v0.t
vmv1r.v v0, v11
```

due to an earlyclobber needed for the narrowing overlap rules.

LMUL <=1 doesn't have the earlyclobber because the overlap would always be "in the lowest-numbered part of the source register group". So. it generates

```
vmflt.vv        v0, v10, v8, v0.t
```

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D131967/new/

https://reviews.llvm.org/D131967