[llvm] [RISCV][TTI] Fix a costing mistake for truncate/fp_round with LMUL>m1 (PR #101051)
Philip Reames via llvm-commits
llvm-commits at lists.llvm.org
Tue Jul 30 07:45:26 PDT 2024
================
@@ -1108,60 +1108,60 @@ define void @trunc() {
; RV32-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv1i64_nxv1i1 = trunc <vscale x 1 x i64> undef to <vscale x 1 x i1>
; RV32-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %nxv2i16_nxv2i8 = trunc <vscale x 2 x i16> undef to <vscale x 2 x i8>
; RV32-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv2i32_nxv2i8 = trunc <vscale x 2 x i32> undef to <vscale x 2 x i8>
-; RV32-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %nxv2i64_nxv2i8 = trunc <vscale x 2 x i64> undef to <vscale x 2 x i8>
+; RV32-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %nxv2i64_nxv2i8 = trunc <vscale x 2 x i64> undef to <vscale x 2 x i8>
; RV32-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %nxv2i32_nxv2i16 = trunc <vscale x 2 x i32> undef to <vscale x 2 x i16>
-; RV32-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv2i64_nxv2i16 = trunc <vscale x 2 x i64> undef to <vscale x 2 x i16>
-; RV32-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %nxv2i64_nxv2i32 = trunc <vscale x 2 x i64> undef to <vscale x 2 x i32>
+; RV32-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %nxv2i64_nxv2i16 = trunc <vscale x 2 x i64> undef to <vscale x 2 x i16>
+; RV32-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %nxv2i64_nxv2i32 = trunc <vscale x 2 x i64> undef to <vscale x 2 x i32>
----------------
preames wrote:
I went and ran a couple micro benchmarks on the bp3.
```
Running vnsrl-mf2.out
~3.934785 cycles-per-inst
~4053.832000 cycles-per-iteration
~1030.255150 insts-per-iteration
Running vnsrl-m1.out
~3.971956 cycles-per-inst
~4020.713650 cycles-per-iteration
~1012.275450 insts-per-iteration
Running vnsrl-m2.out
~7.453559 cycles-per-inst
~8139.421000 cycles-per-iteration
~1092.018050 insts-per-iteration
Running vnsrl-m4.out
~14.008862 cycles-per-inst
~16234.939100 cycles-per-iteration
~1158.904950 insts-per-iteration
For comparison, here's a vadd.vv at m1:
~3.970621 cycles-per-inst
~4017.622100 cycles-per-iteration
~1011.837100 insts-per-iteration
```
So, at least on this board, it looks like you're right that the cost is scaling with the destination LMUL, not the source LMUL. Interesting!
https://github.com/llvm/llvm-project/pull/101051
More information about the llvm-commits
mailing list