[PATCH] D67148: [LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize

Thu Sep 26 19:55:17 PDT 2019

wuzish marked an inline comment as done.
wuzish added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7768
   // interleaving.
-  if (!TTI->getNumberOfRegisters(true) && TTI->getMaxInterleaveFactor(1) < 2)
+  if (!TTI->getNumberOfRegisters(TargetTransformInfo::GenericVectorRC) && TTI->getMaxInterleaveFactor(1) < 2)
     return false;
----------------
hfinkel wrote:
> wuzish wrote:
> > hfinkel wrote:
> > > wuzish wrote:
> > > > hfinkel wrote:
> > > > > wuzish wrote:
> > > > > > hfinkel wrote:
> > > > > > > I think that we can just make a separate function for this:
> > > > > > > 
> > > > > > >   TTI->hasVectorRegisters()
> > > > > > > 
> > > > > > > (and then use that here and in the SLP vectorizer).
> > > > > > I think it's could be like `TTI->getRegisterClassForType(F.getType(), true)` above
> > > > > But above, F.getType() gives back the right scalar type because F is the LSR::Formula. Here F in the function, right? I don't think it makes sense to ask for the register class of the function type.
> > > > Yes. And above case would return nullptr, so we need care about this situation. And here we can left type to be nullptr as default value argument.
> > > I suppose that this makes sense if we assume that `TTI->getRegisterClassForType` is called only on legal types? I'm a bit worried here because, at the IR level, all types are supported and should be legalized into *some* register class. We should better document the expected behavior here one way or the other.
> > If I am not misunderstanding what you mean, then I think the type should be `isSingleValueType`. We can document it at the comments in getRegisterClassForType prototype declare.
> I agree, but I think that's insufficient. isSingleValueType is true for vector types, and so if we have an architecture with no vector registers, and we call TTI->getRegisterClassForType on a vector type, it would return the class associated with the scalarized type. I think that what you intend to say is something like:
> 
>   getRegisterClassForType returns the register class associated with the provided type, accounting for type promotion and other type-legalization techniques that the target might apply, however, it specifically does not account for the scalarization or splitting of vector types. Should a vector type require scalarization or splitting into multiple underlying vector registers, that type should be mapped to a register class containing no registers.
> 
> In some sense, this seems reasonable, because the interface does not provide any way to figure out *how many* of a particular register in a register class the type might use, and so we can't return a sensible answer in cases where splitting or scalarization is required. It's a bit unfortunate, however, because the same consideration applies to scalar types too (e.g., i256 probably takes up multiple scalar registers). What we really should do, I think, is expand this interface to return, perhaps optionally, the number of registers of the provided class. In the implementation, in such a case, could start by running:
> 
>   std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, Ty);
> 
> and then using the MVT for making the register-class decisions.
> 
> However, it does not seem possible to make this change while retaining the current behavior for other targets (because we'd change what happens for illegal types), and thus, I recommend that a comment be added along the lines of my suggestion above, and then we also add this:
> 
>   // FIXME: It's not currently possible to determine how many registers are used by the provided type.
> 
> and we address this aspect later in a different patch.
May the number issue be addressed by such code in LoopVectorize.cpp?

```
// A lambda that gets the register usage for the given type and VF.
  auto GetRegUsage = [&DL, WidestRegister](Type *Ty, unsigned VF) {
    if (Ty->isTokenTy())
      return 0U;
    unsigned TypeSize = DL.getTypeSizeInBits(Ty->getScalarType());
    return std::max<unsigned>(1, VF * TypeSize / WidestRegister);
  };
```

And the `WidestRegister` is from `TTI.getRegisterBitWidth` which I think should be related with Type*.
Yes, the only one interface `TLI->getTypeLegalizationCost ` would be more consistent and easy to maintain. 

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67148/new/

https://reviews.llvm.org/D67148