[PATCH] D102505: [RISCV] Support vector types in combination with fastcc

Mon May 24 05:47:56 PDT 2021

frasercrmck added inline comments.

================
Comment at: llvm/lib/Target/RISCV/RISCVISelLowering.cpp:7160
+    if (unsigned Reg =
+            allocateRVVReg(ValVT, ValNo, FirstMaskArgument, State, TLI)) {
+      // Fixed-length vectors are located in the corresponding scalable-vector
----------------
frasercrmck wrote:
> HsiangKai wrote:
> > Should we allow all the vector registers used to pass vector arguments under fastcc, instead of limiting to v8 to v23?
> Sure, that's a possibility. I think it opens up some interesting new situations if we combine that with the first mask going to v0, like (with max-lmul=8):
> 
> ```
> define fastcc <4 x i1> @foo(<32 x i32> %x, <8 x i32> %y, <32 x i32> %z, <32 x i32> %w, <4 x i1> %m1, <4 x i1> %m2, <4 x i1> %m3) {
> ; %x -> $v8m8
> ; %y -> $v2m2
> ; %z -> $v16m8
> ; %w -> $v24m8
> ; %m1 -> $v0
> ; %m2 -> $v1
> ; %m3 -> $v4
> ```
> 
> Do you think that's worth it? It's slightly harder to reason about (finding which operand goes to which register takes a bit of back-and-forth) but the allocation is certainly improved. I suppose that's the goal of fastcc.
Now I'm seeing a surprising register allocation failure when I allocate all registers to function arguments.

```
define fastcc <vscale x 32 x i32> @foo(<vscale x 32 x i32> %x, <vscale x 32 x i32> %y, <vscale x 32 x i32> %z, i32 %w) {
```

Gives me the following lowering code, with %x -> v0m8/v8m8, %y -> v16m8/v24m8 and %z -> indirect(x10) and indirect(x12):

```
bb.0 (%ir-block.0):
  liveins: $v0m8, $v8m8, $v16m8, $v24m8, $x10, $x12
  %5:gpr = COPY $x12
  %4:gpr = COPY $x10
  %3:vrm8 = COPY $v24m8
  %2:vrm8 = COPY $v16m8
  %1:vrm8 = COPY $v8m8
  %0:vrm8 = COPY $v0m8
  %6:vrm8 = VL8RE32_V %4:gpr :: (load unknown-size, align 64)
  %7:gpr = ADDI %4:gpr, 64
  %8:vrm8 = VL8RE32_V %7:gpr :: (load unknown-size, align 64)
```

Then the machine scheduler decides to do this:

```
0B	bb.0 (%ir-block.0):
	  liveins: $v0m8, $v8m8, $v16m8, $v24m8, $x10, $x12
32B	  %4:gpr = COPY $x10
128B	  %7:gpr = ADDI %4:gpr, 64
136B	  %6:vrm8 = VL8RE32_V %4:gpr :: (load unknown-size, align 64)
144B	  %8:vrm8 = VL8RE32_V %7:gpr :: (load unknown-size, align 64)
152B	  %5:gpr = COPY $x12
160B	  %3:vrm8 = COPY $v24m8
168B	  %2:vrm8 = COPY $v16m8
176B	  %1:vrm8 = COPY $v8m8
184B	  %0:vrm8 = COPY $v0m8
```

And then register allocator is unable to allocate for `%6` or `%8` since all physical registers are occupied. I'm surprised the scheduler made that change.

I'm not the most familiar with other targets and their calling conventions, but perhaps this isn't supported. A compromise could be to leave `v24m8` free but continue to use `v1-v7`, which is still more than the base calling convention supports.

Any thoughts?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D102505/new/

https://reviews.llvm.org/D102505