[PATCH] D98934: [SVE] Add instruction cost for fptrunc in loops

Tue Mar 23 02:46:06 PDT 2021

david-arm added a comment.

Hi @nasherm, thanks for addressing the previous review comments - it looks much better now! I just have one more comment about the estimated costs in the table.

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:473
+    { ISD::FP_ROUND, MVT::nxv4f16, MVT::nxv4f32, 1 },
+    { ISD::FP_ROUND, MVT::nxv8f16, MVT::nxv8f32, 1 },
+
----------------
I think any conversions that involve illegal types that are too large for a single register will be split up into multiple instructions. For example, nxv8f32 is twice the size of a normal SVE register, which means we actually need 2 instructions that convert nxv4f32 -> nxv4f16. Then, we need a final third instruction to interleave these two results together. I'd expect something a bit like:

  { ISD::FP_ROUND, MVT::nxv8f16, MVT::nxv8f32, 3 }, (2 converts + interleave)
  ...
  { ISD::FP_ROUND, MVT::nxv4f16, MVT::nxv4f64, 3 }, (2 converts + interleave)
  ...
  { ISD::FP_ROUND, MVT::nxv8f16, MVT::nxv8f64, 7 }, (4 converts + 3 interleaves)
  ...
  { ISD::FP_ROUND, MVT::nxv4f32, MVT::nxv4f64, 3 }, (2 converts + interleave)
  ...
  { ISD::FP_ROUND, MVT::nxv8f32, MVT::nxv8f64, 6 }, (4 converts + 2 interleaves)

It's worth pointing out that these costs are just estimates - the most important thing is the costs should be higher to reflect the increased complexity of the operation.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98934/new/

https://reviews.llvm.org/D98934