[PATCH] D95659: [RISCV] Initial support of LoopVectorizer for RISC-V Vector.

Fri Jan 29 10:45:38 PST 2021

vkmr added inline comments.

================
Comment at: llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h:52-55
+  unsigned getRegisterBitWidth(bool Vector) const {
+    if (Vector) {
+      if (ST->hasStdExtV())
+        return RISCVVType::RVVBitsPerBlock;
----------------
craig.topper wrote:
> vkmr wrote:
> > If I understand correctly, the assumption behind this code is that a **single** vector register is of size `vscale x RVVBitsPerBlock` and ignore the idea (for now?) of having register groups, i.e LMUL>1. 
> > Unless we are ignoring register grouping for now, from Loop Vectorizer's perspective it would make sense to view the register group size as the real register size, specially for computing a feasible VF based on register usage.
> > Since the documentation of `getRegisterBitWidth()` defines it to be "The width of the largest scalar or vector register type", it might be more accurate to use `getMinVectorRegisterBitWidth()` to return `RISCVVType::RVVBitsPerBlock` and `getRegisterBitWidth()` to return  `getMinVectorRegisterBitWidth() * MAX_LMUL`. (I am not considering fractional LMUL here.)
> Returning a non-zero value seems to at least partially enable the vectorizer to generate fixed vectors which isn't supported by the backend yet. It looks like something else stopped it in my testing, but it at least queried the cost model. Not sure what stopped it.
> 
> I do plan to support fixed vectors in the RVV backend, but it will probably be a couple weeks away. The register width here will probably need to be a command line controlled value like AArch64. And it should be at least 128 bits per the 0.10 spec. So I don't think its connected to RVVBitsPerBlock.
Perhaps I misunderstood something, my concern here is more about how to encapsulate the idea of register grouping for scalable vectors in the  TTI methods to query register widths.  Having a command line option to control register width would still only reflect the width of a single register, right? Perhaps, we can add another command line option to specify a max group multiplier (essentially the Maximum LMUL value).

IIRC, the TTI method `getMinVectorRegisterBitWidth()` in addition to `getRegisterBitWidth()` was introduced to handle similar concerns with NEON. With scalable vectors, things are a little more complicated.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95659/new/

https://reviews.llvm.org/D95659