[PATCH] D118020: [RISCV] Set CostPerUse for floating point registers
Craig Topper via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Feb 18 08:54:50 PST 2022
craig.topper added a comment.
In D118020#3331582 <https://reviews.llvm.org/D118020#3331582>, @pcwang-thead wrote:
> In D118020#3329779 <https://reviews.llvm.org/D118020#3329779>, @asb wrote:
>
>> I'm surprised this resulted in performance increases. I might have guessed that with so few FP instructions being compressible, the further constraint on register selection might be more likely to result in a (slight) decrease in performance. Shows the value of running the benchmarks!
>>
>> I've put this patch on the agenda for the RISC-V LLVM call today, but based on the data so far this seems to make sense.
>
> I am surprised too.
>
> IMO, there is a possible reason that may explain the performance increases:
>
> - When register number is in [8, 15], instructions can be compressed.
> - For the first 16 integer registers, registers x0-x4(and sometimes x5) are reserved for special usage, and the register allocation orders are like below:
>
> def GPR : RegisterClass<"RISCV", [XLenVT], 32, (add
> (sequence "X%u", 10, 17),
> (sequence "X%u", 5, 7),
> (sequence "X%u", 28, 31),
> (sequence "X%u", 8, 9),
> (sequence "X%u", 18, 27),
> (sequence "X%u", 0, 4)
> )> {
> let RegInfos = XLenRI;
> }
>
> which means we will allocates most RVC integer registers first.
> So, for most programs, there is minimal difference whether we set `CostPerUse` to `0` or `1`.
>
> - For the first 16 float registers, there is no reserved register, and the register allocation orders are like below:
>
> def FPR32 : RegisterClass<"RISCV", [f32], 32, (add
> (sequence "F%u_F", 0, 7),
> (sequence "F%u_F", 10, 17),
> (sequence "F%u_F", 28, 31),
> (sequence "F%u_F", 8, 9),
> (sequence "F%u_F", 18, 27)
> )>;
>
> which means we will allocates temporary float registers first and most float instructions can't be compressed.
> So when we set `CostPerUse` to `1`, a lot of float instructions can be compressed, which results in improvements on icache misses.
Our confusion is that there are only 8 float opcodes that have compressed forms. They are all loads/stores and 4 of them are limited to RV32.
I ran 453.povray from SPEC2006 on a SiFive Unmatched board with this patch applied to our downstream compiler. My result was a 1% decrease in performance.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D118020/new/
https://reviews.llvm.org/D118020
More information about the llvm-commits
mailing list