[PATCH] D98934: [SVE] Add instruction cost for fptrunc in loops
David Sherwood via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Mar 23 02:46:06 PDT 2021
david-arm added a comment.
Hi @nasherm, thanks for addressing the previous review comments - it looks much better now! I just have one more comment about the estimated costs in the table.
================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:473
+ { ISD::FP_ROUND, MVT::nxv4f16, MVT::nxv4f32, 1 },
+ { ISD::FP_ROUND, MVT::nxv8f16, MVT::nxv8f32, 1 },
+
----------------
I think any conversions that involve illegal types that are too large for a single register will be split up into multiple instructions. For example, nxv8f32 is twice the size of a normal SVE register, which means we actually need 2 instructions that convert nxv4f32 -> nxv4f16. Then, we need a final third instruction to interleave these two results together. I'd expect something a bit like:
{ ISD::FP_ROUND, MVT::nxv8f16, MVT::nxv8f32, 3 }, (2 converts + interleave)
...
{ ISD::FP_ROUND, MVT::nxv4f16, MVT::nxv4f64, 3 }, (2 converts + interleave)
...
{ ISD::FP_ROUND, MVT::nxv8f16, MVT::nxv8f64, 7 }, (4 converts + 3 interleaves)
...
{ ISD::FP_ROUND, MVT::nxv4f32, MVT::nxv4f64, 3 }, (2 converts + interleave)
...
{ ISD::FP_ROUND, MVT::nxv8f32, MVT::nxv8f64, 6 }, (4 converts + 2 interleaves)
It's worth pointing out that these costs are just estimates - the most important thing is the costs should be higher to reflect the increased complexity of the operation.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D98934/new/
https://reviews.llvm.org/D98934
More information about the llvm-commits
mailing list