[PATCH] D102505: [RISCV] Support vector types in combination with fastcc

Wed May 26 01:09:06 PDT 2021

HsiangKai added inline comments.

================
Comment at: llvm/lib/Target/RISCV/RISCVISelLowering.cpp:7160
+    if (unsigned Reg =
+            allocateRVVReg(ValVT, ValNo, FirstMaskArgument, State, TLI)) {
+      // Fixed-length vectors are located in the corresponding scalable-vector
----------------
frasercrmck wrote:
> frasercrmck wrote:
> > HsiangKai wrote:
> > > Should we allow all the vector registers used to pass vector arguments under fastcc, instead of limiting to v8 to v23?
> > Sure, that's a possibility. I think it opens up some interesting new situations if we combine that with the first mask going to v0, like (with max-lmul=8):
> > 
> > ```
> > define fastcc <4 x i1> @foo(<32 x i32> %x, <8 x i32> %y, <32 x i32> %z, <32 x i32> %w, <4 x i1> %m1, <4 x i1> %m2, <4 x i1> %m3) {
> > ; %x -> $v8m8
> > ; %y -> $v2m2
> > ; %z -> $v16m8
> > ; %w -> $v24m8
> > ; %m1 -> $v0
> > ; %m2 -> $v1
> > ; %m3 -> $v4
> > ```
> > 
> > Do you think that's worth it? It's slightly harder to reason about (finding which operand goes to which register takes a bit of back-and-forth) but the allocation is certainly improved. I suppose that's the goal of fastcc.
> Now I'm seeing a surprising register allocation failure when I allocate all registers to function arguments.
> 
> ```
> define fastcc <vscale x 32 x i32> @foo(<vscale x 32 x i32> %x, <vscale x 32 x i32> %y, <vscale x 32 x i32> %z, i32 %w) {
> ```
> 
> Gives me the following lowering code, with %x -> v0m8/v8m8, %y -> v16m8/v24m8 and %z -> indirect(x10) and indirect(x12):
> 
> ```
> bb.0 (%ir-block.0):
>   liveins: $v0m8, $v8m8, $v16m8, $v24m8, $x10, $x12
>   %5:gpr = COPY $x12
>   %4:gpr = COPY $x10
>   %3:vrm8 = COPY $v24m8
>   %2:vrm8 = COPY $v16m8
>   %1:vrm8 = COPY $v8m8
>   %0:vrm8 = COPY $v0m8
>   %6:vrm8 = VL8RE32_V %4:gpr :: (load unknown-size, align 64)
>   %7:gpr = ADDI %4:gpr, 64
>   %8:vrm8 = VL8RE32_V %7:gpr :: (load unknown-size, align 64)
> ```
> 
> Then the machine scheduler decides to do this:
> 
> ```
> 0B	bb.0 (%ir-block.0):
> 	  liveins: $v0m8, $v8m8, $v16m8, $v24m8, $x10, $x12
> 32B	  %4:gpr = COPY $x10
> 128B	  %7:gpr = ADDI %4:gpr, 64
> 136B	  %6:vrm8 = VL8RE32_V %4:gpr :: (load unknown-size, align 64)
> 144B	  %8:vrm8 = VL8RE32_V %7:gpr :: (load unknown-size, align 64)
> 152B	  %5:gpr = COPY $x12
> 160B	  %3:vrm8 = COPY $v24m8
> 168B	  %2:vrm8 = COPY $v16m8
> 176B	  %1:vrm8 = COPY $v8m8
> 184B	  %0:vrm8 = COPY $v0m8
> ```
> 
> And then register allocator is unable to allocate for `%6` or `%8` since all physical registers are occupied. I'm surprised the scheduler made that change.
> 
> I'm not the most familiar with other targets and their calling conventions, but perhaps this isn't supported. A compromise could be to leave `v24m8` free but continue to use `v1-v7`, which is still more than the base calling convention supports.
> 
> Any thoughts?
I am curious why the register allocator does not spill the vector registers?

By the way, it is strange to add 64 to the base address($x10) for the arguments. It should add `8 x vlenb`, right? I know it is another issue not related to the patch. We could prepare another patch to fix it.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D102505/new/

https://reviews.llvm.org/D102505