[PATCH] D81813: [ARM] MVE FP16 cost adjustments

Wed Jun 24 06:27:09 PDT 2020

dmgreen marked 2 inline comments as done.
dmgreen added a comment.

Yeah, I can split this up. It did grow a bit. The parts are sometimes interrelated but we now have D82456 <https://reviews.llvm.org/D82456>, D82457 <https://reviews.llvm.org/D82457> and D82458 <https://reviews.llvm.org/D82458>. And I will rebase this to the remaining fp16 parts.

================
Comment at: llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp:501
+    // and cost 1 vcvt for each lane, so long as the instruction is available.
+    // If not it will become a series of function calls.
+    const int CallCost = CostKind == TTI::TCK_RecipThroughput ? 10 : 3;
----------------
samparker wrote:
> Maybe a comment on your magic number choices, or how about using getIntrinsicInstrCost for the call cost?
getCallInstrCost looks like it might work - at least give the right cost. It will not include the same codesize changes, but that can be done separately. It's hard to say exactly what the call will cost with all the stuff that might go on inside it.

================
Comment at: llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp:985
+  if (ST->hasMVEFloatOps() && isa<FixedVectorType>(Src) && I &&
+      ((Opcode == Instruction::Load && I->hasOneUse() &&
+        isa<FPExtInst>(*I->user_begin())) ||
----------------
samparker wrote:
> I thought you stance was that we shouldn't be looking at the context?!
No. It's more... nuanced than that. We need to look at context in order to ever accurately model things. At least at the moment. Without this bit of code for example, all <4 x f16> would be quite expensive, so you end up with any fp16->fp32 convert not being vectorized, even if it should be beneficial.

But there are only certain parts of the context that are valid to look at. The opcode of the surrounding instructions is almost certainly OK, and in this case I would claim that the _scalar_ type of the float makes sense because it's converting a float (not an int). The absolute type could be wrong because it would be passed as a different type from the vectorizer (with a different number of vector elements). If it was an integer type then that could be wrong too because the vectorizer can promote types as it vectorizes to a smaller bitwidth.

I'm not saying it's a great design and would love a better way to do this, but it's the one we've currently got.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D81813/new/

https://reviews.llvm.org/D81813