[PATCH] D102505: [RISCV] Support vector types in combination with fastcc
Fraser Cormack via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon May 24 05:47:56 PDT 2021
frasercrmck added inline comments.
================
Comment at: llvm/lib/Target/RISCV/RISCVISelLowering.cpp:7160
+ if (unsigned Reg =
+ allocateRVVReg(ValVT, ValNo, FirstMaskArgument, State, TLI)) {
+ // Fixed-length vectors are located in the corresponding scalable-vector
----------------
frasercrmck wrote:
> HsiangKai wrote:
> > Should we allow all the vector registers used to pass vector arguments under fastcc, instead of limiting to v8 to v23?
> Sure, that's a possibility. I think it opens up some interesting new situations if we combine that with the first mask going to v0, like (with max-lmul=8):
>
> ```
> define fastcc <4 x i1> @foo(<32 x i32> %x, <8 x i32> %y, <32 x i32> %z, <32 x i32> %w, <4 x i1> %m1, <4 x i1> %m2, <4 x i1> %m3) {
> ; %x -> $v8m8
> ; %y -> $v2m2
> ; %z -> $v16m8
> ; %w -> $v24m8
> ; %m1 -> $v0
> ; %m2 -> $v1
> ; %m3 -> $v4
> ```
>
> Do you think that's worth it? It's slightly harder to reason about (finding which operand goes to which register takes a bit of back-and-forth) but the allocation is certainly improved. I suppose that's the goal of fastcc.
Now I'm seeing a surprising register allocation failure when I allocate all registers to function arguments.
```
define fastcc <vscale x 32 x i32> @foo(<vscale x 32 x i32> %x, <vscale x 32 x i32> %y, <vscale x 32 x i32> %z, i32 %w) {
```
Gives me the following lowering code, with %x -> v0m8/v8m8, %y -> v16m8/v24m8 and %z -> indirect(x10) and indirect(x12):
```
bb.0 (%ir-block.0):
liveins: $v0m8, $v8m8, $v16m8, $v24m8, $x10, $x12
%5:gpr = COPY $x12
%4:gpr = COPY $x10
%3:vrm8 = COPY $v24m8
%2:vrm8 = COPY $v16m8
%1:vrm8 = COPY $v8m8
%0:vrm8 = COPY $v0m8
%6:vrm8 = VL8RE32_V %4:gpr :: (load unknown-size, align 64)
%7:gpr = ADDI %4:gpr, 64
%8:vrm8 = VL8RE32_V %7:gpr :: (load unknown-size, align 64)
```
Then the machine scheduler decides to do this:
```
0B bb.0 (%ir-block.0):
liveins: $v0m8, $v8m8, $v16m8, $v24m8, $x10, $x12
32B %4:gpr = COPY $x10
128B %7:gpr = ADDI %4:gpr, 64
136B %6:vrm8 = VL8RE32_V %4:gpr :: (load unknown-size, align 64)
144B %8:vrm8 = VL8RE32_V %7:gpr :: (load unknown-size, align 64)
152B %5:gpr = COPY $x12
160B %3:vrm8 = COPY $v24m8
168B %2:vrm8 = COPY $v16m8
176B %1:vrm8 = COPY $v8m8
184B %0:vrm8 = COPY $v0m8
```
And then register allocator is unable to allocate for `%6` or `%8` since all physical registers are occupied. I'm surprised the scheduler made that change.
I'm not the most familiar with other targets and their calling conventions, but perhaps this isn't supported. A compromise could be to leave `v24m8` free but continue to use `v1-v7`, which is still more than the base calling convention supports.
Any thoughts?
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D102505/new/
https://reviews.llvm.org/D102505
More information about the llvm-commits
mailing list