[PATCH] D79162: [Analysis] TTI: Add CastContextHint for getCastInstrCost

Thu Apr 30 14:00:28 PDT 2020

dmgreen added a comment.

I don't think I agree that this is a hack, exactly. At least if it was cleaned up. It follows the same method as getArithmeticInstrCost where the type of the parameter is passed through, allowing us to get the information we need but not forcing us to pin this to a potentially incorrect or non-existent instructions. This separation seems like a good thing if we can do it well.

We (ARM/MVE) need to do the same thing for the cost of gather and interleaved loads. Whether the sext/zext is free there is equally variable. The way that I would have imagined this is a enum that can be one of the types of loads that the vectorizer produces (Normal, Masked, Interleave, Gather, Expanded?). There probably needs to be an option for None or Unknown too. I understand that you tried this before but ran into trouble? Can you speak to what kinds of problems you ran into doing things that way?

In D79162#2012659 <https://reviews.llvm.org/D79162#2012659>, @samparker wrote:

> Indeed. But if were have a getLoopVectorizationCost API, then it would seem reasonable enough to pass that object there.

Although this might well be something that we do need in the long run, it will be very tied into how vplan ends up doing costmodelling, should probably not be limited the loopvectorization and is probably a much bigger task to design and implement than this. Not something that an intern with less than a month left should be asked to do. Being able to cost blocks of code at a time feels like it's quite important to MVE, but is not something we should rush into here.

The other option if all this is unworkable is to put the cost into getMaskedMemOpCost. Or just bypass the cost modelling and force the vectorizer to not consider wide vectors when tail predicating, which I think is something that you've suggested before. If we do try to cost it properly, where we choose to put the high cost is up to us in a way. It's the masked load we are choosing not to split into something where the the extend would end up as free, even if it is extend which is expanded.  The actual cost, when you get down to it, comes from choosing to not split the masked load and not being able to sensible split VCTP's, leaving the predicate shuffle between the vctp and the load as high cost.

We may run into the same kinds of problems in getMaskedMemOpCost, but if we pass an 'I' through we can try and tell if it needs to be extended there.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79162/new/

https://reviews.llvm.org/D79162