[PATCH] D102505: [RISCV] Support vector types in combination with fastcc

Sun May 30 07:30:23 PDT 2021

HsiangKai added inline comments.

================
Comment at: llvm/lib/Target/RISCV/RISCVISelLowering.cpp:7160
+    if (unsigned Reg =
+            allocateRVVReg(ValVT, ValNo, FirstMaskArgument, State, TLI)) {
+      // Fixed-length vectors are located in the corresponding scalable-vector
----------------
frasercrmck wrote:
> frasercrmck wrote:
> > HsiangKai wrote:
> > > frasercrmck wrote:
> > > > frasercrmck wrote:
> > > > > HsiangKai wrote:
> > > > > > Should we allow all the vector registers used to pass vector arguments under fastcc, instead of limiting to v8 to v23?
> > > > > Sure, that's a possibility. I think it opens up some interesting new situations if we combine that with the first mask going to v0, like (with max-lmul=8):
> > > > > 
> > > > > ```
> > > > > define fastcc <4 x i1> @foo(<32 x i32> %x, <8 x i32> %y, <32 x i32> %z, <32 x i32> %w, <4 x i1> %m1, <4 x i1> %m2, <4 x i1> %m3) {
> > > > > ; %x -> $v8m8
> > > > > ; %y -> $v2m2
> > > > > ; %z -> $v16m8
> > > > > ; %w -> $v24m8
> > > > > ; %m1 -> $v0
> > > > > ; %m2 -> $v1
> > > > > ; %m3 -> $v4
> > > > > ```
> > > > > 
> > > > > Do you think that's worth it? It's slightly harder to reason about (finding which operand goes to which register takes a bit of back-and-forth) but the allocation is certainly improved. I suppose that's the goal of fastcc.
> > > > Now I'm seeing a surprising register allocation failure when I allocate all registers to function arguments.
> > > > 
> > > > ```
> > > > define fastcc <vscale x 32 x i32> @foo(<vscale x 32 x i32> %x, <vscale x 32 x i32> %y, <vscale x 32 x i32> %z, i32 %w) {
> > > > ```
> > > > 
> > > > Gives me the following lowering code, with %x -> v0m8/v8m8, %y -> v16m8/v24m8 and %z -> indirect(x10) and indirect(x12):
> > > > 
> > > > ```
> > > > bb.0 (%ir-block.0):
> > > >   liveins: $v0m8, $v8m8, $v16m8, $v24m8, $x10, $x12
> > > >   %5:gpr = COPY $x12
> > > >   %4:gpr = COPY $x10
> > > >   %3:vrm8 = COPY $v24m8
> > > >   %2:vrm8 = COPY $v16m8
> > > >   %1:vrm8 = COPY $v8m8
> > > >   %0:vrm8 = COPY $v0m8
> > > >   %6:vrm8 = VL8RE32_V %4:gpr :: (load unknown-size, align 64)
> > > >   %7:gpr = ADDI %4:gpr, 64
> > > >   %8:vrm8 = VL8RE32_V %7:gpr :: (load unknown-size, align 64)
> > > > ```
> > > > 
> > > > Then the machine scheduler decides to do this:
> > > > 
> > > > ```
> > > > 0B	bb.0 (%ir-block.0):
> > > > 	  liveins: $v0m8, $v8m8, $v16m8, $v24m8, $x10, $x12
> > > > 32B	  %4:gpr = COPY $x10
> > > > 128B	  %7:gpr = ADDI %4:gpr, 64
> > > > 136B	  %6:vrm8 = VL8RE32_V %4:gpr :: (load unknown-size, align 64)
> > > > 144B	  %8:vrm8 = VL8RE32_V %7:gpr :: (load unknown-size, align 64)
> > > > 152B	  %5:gpr = COPY $x12
> > > > 160B	  %3:vrm8 = COPY $v24m8
> > > > 168B	  %2:vrm8 = COPY $v16m8
> > > > 176B	  %1:vrm8 = COPY $v8m8
> > > > 184B	  %0:vrm8 = COPY $v0m8
> > > > ```
> > > > 
> > > > And then register allocator is unable to allocate for `%6` or `%8` since all physical registers are occupied. I'm surprised the scheduler made that change.
> > > > 
> > > > I'm not the most familiar with other targets and their calling conventions, but perhaps this isn't supported. A compromise could be to leave `v24m8` free but continue to use `v1-v7`, which is still more than the base calling convention supports.
> > > > 
> > > > Any thoughts?
> > > I am curious why the register allocator does not spill the vector registers?
> > > 
> > > By the way, it is strange to add 64 to the base address($x10) for the arguments. It should add `8 x vlenb`, right? I know it is another issue not related to the patch. We could prepare another patch to fix it.
> > Are you asking why it doesn't spill one of the `liveins` so it can move `%6` into it? I'm not sure. I'm not an expert on the register allocator in any way, but I have seen issues with calling conventions + regalloc in the past. I wonder if it isn't able to spill physical livein registers.
> > 
> > Yes, it's a bit suspicious that it loads from a fixed offset from the base address. Probably not caused by this patch but I'll look into that.
> Please see D103262 for a fix for that offset issue.
Got it. I think we could use v8 to v23 for fastcc first. If we want to use more vector registers for fastcc, we could refine it later. It has no impact on the standard calling convention.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D102505/new/

https://reviews.llvm.org/D102505