[PATCH] D100745: [AArch64] Add AArch64TTIImpl::getMaskedMemoryOpCost function

Wed Apr 21 10:43:33 PDT 2021

dmgreen added a comment.

> This is because as soon as you enable SVE you effectively switch on masked loads and stores. The vectoriser only calls isLegalMaskedLoad with an element type, not a vector type. This means that we can't distinguish between fixed width and scalable vectors.

OK, that makes sense. It won't know the vector width until later..

================
Comment at: llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp:1446
-  // generated code.
-  return cast<FixedVectorType>(Src)->getNumElements() * 8;
 }
----------------
david-arm wrote:
> dmgreen wrote:
> > Br and PHI are often free, but were accounted for here. I think the old code might have been fine, and more accurate for arm.
> OK sure, I'll revert it then. I'm not sure the BasicTTIImpl is that accurate for AArch64 either, because we treat branches as zero cost for some reason. Also, probably the i1 vector extract cost is too low as well.
I think the reasoning is that unconditional branches are often zero cost in modern cpus, in terms of throughput/latency. Conditional branches will depend on the branch predictor, and the number of branches from a scalarized intrinsic can start to break that.

It may be worth adding a few llvm.masked.store/llvm.masked.load cost checks for AArch64, if we don't have them already, to show the costs more clearly.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100745/new/

https://reviews.llvm.org/D100745