[PATCH] D82458: [ARM] Adjust default fp extend and trunc costs

Wed Jun 24 06:59:28 PDT 2020

dmgreen marked 2 inline comments as done.
dmgreen added inline comments.

================
Comment at: llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp:253
+  // Single to/from double precision conversions.
+  if (Src->isVectorTy() && ST->hasNEON() &&
+      ((ISD == ISD::FP_ROUND && SrcTy.getScalarType() == MVT::f64 &&
----------------
SjoerdMeijer wrote:
> Quickly looking at this, I was wondering 2 things here:
> - how about MVE?
> - and how about f16?
This is just the bit of code removed from above, put into a better place and with checks for the actual types this was expecting. The old code was obviously written from a time before fp16 and doesn't account for it properly.

MVE is handled in a different patch. These were all interrelated but split up from D81813.

FP16 on NEON will fall back to the "1 vcvt per lane" code below, which will be better when fp16 is not present and will give roughly the same numbers when fp16 is present. But that is not what I am trying to fix here, I'm just trying to make sure it's not worse than before.

================
Comment at: llvm/test/Analysis/CostModel/ARM/cast.ll:134
 ; CHECK-NEON-RECIP-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: %r84df = fptrunc <16 x double> undef to <16 x float>
-; CHECK-NEON-RECIP-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %r80dh = fptrunc double undef to half
-; CHECK-NEON-RECIP-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %r81dh = fptrunc <2 x double> undef to <2 x half>
-; CHECK-NEON-RECIP-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %r82dh = fptrunc <4 x double> undef to <4 x half>
-; CHECK-NEON-RECIP-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %r83dh = fptrunc <8 x double> undef to <8 x half>
-; CHECK-NEON-RECIP-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: %r84dh = fptrunc <16 x double> undef to <16 x half>
-; CHECK-NEON-RECIP-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %r80fh = fptrunc float undef to half
-; CHECK-NEON-RECIP-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %r81fh = fptrunc <2 x float> undef to <2 x half>
-; CHECK-NEON-RECIP-NEXT:  Cost Model: Found an estimated cost of 7 for instruction: %r82fh = fptrunc <4 x float> undef to <4 x half>
-; CHECK-NEON-RECIP-NEXT:  Cost Model: Found an estimated cost of 14 for instruction: %r83fh = fptrunc <8 x float> undef to <8 x half>
-; CHECK-NEON-RECIP-NEXT:  Cost Model: Found an estimated cost of 28 for instruction: %r84fh = fptrunc <16 x float> undef to <16 x half>
+; CHECK-NEON-RECIP-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %r80dh = fptrunc double undef to half
+; CHECK-NEON-RECIP-NEXT:  Cost Model: Found an estimated cost of 20 for instruction: %r81dh = fptrunc <2 x double> undef to <2 x half>
----------------
SjoerdMeijer wrote:
> I can't remember if Sam asked the same question on a different patch, but the cost looks a lot higher than the unchanged instructions here. Is that correct?
Yep. The "neon" target will not include fp16, so the converts will all go through __aeabi calls so should be expensive.

Same for the MVE costs below, which do not by default include double.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82458/new/

https://reviews.llvm.org/D82458