[PATCH] D95659: [RISCV] Initial support of LoopVectorizer for RISC-V Vector.

Sat Jan 30 14:47:10 PST 2021

HsiangKai added inline comments.

================
Comment at: llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h:52-55
+  unsigned getRegisterBitWidth(bool Vector) const {
+    if (Vector) {
+      if (ST->hasStdExtV())
+        return RISCVVType::RVVBitsPerBlock;
----------------
HsiangKai wrote:
> craig.topper wrote:
> > vkmr wrote:
> > > craig.topper wrote:
> > > > vkmr wrote:
> > > > > If I understand correctly, the assumption behind this code is that a **single** vector register is of size `vscale x RVVBitsPerBlock` and ignore the idea (for now?) of having register groups, i.e LMUL>1. 
> > > > > Unless we are ignoring register grouping for now, from Loop Vectorizer's perspective it would make sense to view the register group size as the real register size, specially for computing a feasible VF based on register usage.
> > > > > Since the documentation of `getRegisterBitWidth()` defines it to be "The width of the largest scalar or vector register type", it might be more accurate to use `getMinVectorRegisterBitWidth()` to return `RISCVVType::RVVBitsPerBlock` and `getRegisterBitWidth()` to return  `getMinVectorRegisterBitWidth() * MAX_LMUL`. (I am not considering fractional LMUL here.)
> > > > Returning a non-zero value seems to at least partially enable the vectorizer to generate fixed vectors which isn't supported by the backend yet. It looks like something else stopped it in my testing, but it at least queried the cost model. Not sure what stopped it.
> > > > 
> > > > I do plan to support fixed vectors in the RVV backend, but it will probably be a couple weeks away. The register width here will probably need to be a command line controlled value like AArch64. And it should be at least 128 bits per the 0.10 spec. So I don't think its connected to RVVBitsPerBlock.
> > > Perhaps I misunderstood something, my concern here is more about how to encapsulate the idea of register grouping for scalable vectors in the  TTI methods to query register widths.  Having a command line option to control register width would still only reflect the width of a single register, right? Perhaps, we can add another command line option to specify a max group multiplier (essentially the Maximum LMUL value).
> > > 
> > > IIRC, the TTI method `getMinVectorRegisterBitWidth()` in addition to `getRegisterBitWidth()` was introduced to handle similar concerns with NEON. With scalable vectors, things are a little more complicated.
> > I don't think I understand how this interface works for scalable vector vectorization. AArch64 has it connected to a command line which means it can be larger than 128 bits. But I thought the backend needed specific types like <vscale x 4 x i32>. Does this interface effect the fixed portion of the scalable type for scalable vector vectorization? 
> I think I didn't dig into how the callback is used. I remove it in this patch. We could add it back after we have clear idea how to do it.
Craig is right. getRegisterBitWidth() is not related to scalable vector vectorization. It is reasonable to remove it in this patch.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95659/new/

https://reviews.llvm.org/D95659