[Mlir-commits] [llvm] [clang-tools-extra] [compiler-rt] [mlir] [clang] [libcxx] [CostModel][X86] Fix fpext conversion cost for 16 elements (PR #76278)
llvmlistbot at llvm.org
llvmlistbot at llvm.org
Sun Dec 24 17:10:55 PST 2023
HaohaiWen wrote:
> Please can you confirm this as llvm-mca predicts worse case (znver4) to be 4 https://llvm.godbolt.org/z/fxWTaf3Gv
Currenttly, uiCA don't support Zen4 and I don't have Zen4 machine.
I can measure it on local SKX machine with nanoBench (https://github.com/andreas-abel/nanoBench). Maybe you can use it to confirm Zen4 cost if you do have Zen4 machine.
e.g.
```
./nanoBench.sh -init "xor zmm0, zmm0" -asm "vcvtps2pd zmm2, ymm0; vextractf64x4 ymm0, zmm0, 1; vcvtps2pd zmm1, ymm0" -config configs/cfg_SkylakeX_common.txt -unroll 1000 -loop 1000 -warm_up_count 10 -cpu 0
Note: Hyper-threading is enabled; it can be disabled with "sudo ./disable-HT.sh"
CORE_CYCLES: 4.77
INST_RETIRED: 3.00
IDQ.MITE_UOPS: 5.71
IDQ.DSB_UOPS: -0.70
IDQ.MS_UOPS: 0.01
LSD.UOPS: 0.00
UOPS_ISSUED: 5.01
UOPS_EXECUTED: 5.01
UOPS_RETIRED.RETIRE_SLOTS: 5.01
UOPS_DISPATCHED_PORT.PORT_0: 2.00
UOPS_DISPATCHED_PORT.PORT_1: 0.00
UOPS_DISPATCHED_PORT.PORT_2: 0.00
UOPS_DISPATCHED_PORT.PORT_3: 0.00
UOPS_DISPATCHED_PORT.PORT_4: 0.00
UOPS_DISPATCHED_PORT.PORT_5: 3.00
UOPS_DISPATCHED_PORT.PORT_6: 0.00
UOPS_DISPATCHED_PORT.PORT_7: 0.00
```
https://github.com/llvm/llvm-project/pull/76278
More information about the Mlir-commits
mailing list