[PATCH] D129715: [LoongArch] Load FP immediates by movgr2fr from materialized integer
Xi Ruoyao via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Jul 13 21:34:00 PDT 2022
xry111 added a comment.
In D129715#3650672 <https://reviews.llvm.org/D129715#3650672>, @xry111 wrote:
>
> On my 3A5000 (at 2.3 GHz) `cc bench_imm.S && time ./a.out` gives 0.35s, but `cc bench_imm.S -DLOAD_IMM && time ./a.out` gives 0.60s. But I think it's just because for the simple case the constant pool is always in the L1 <https://reviews.llvm.org/L1> cache...
Ah, just make the fetch unit busier then the result will prefer immediate loading:
.loop:
.rept 1024
#if LOAD_IMM
li.d $t0, VALUE
movgr2fr.d $ft0, $t0
#else
la.local $t0, .const0
fld.d $ft0, $t0, 0
#endif
la.local $t0, .const1
ld.d $t2, $t0, 0
.endr
addi.w $t1, $t1, -1
bnez $t1, .loop
li.w $a0, 0
jr $ra
`cc bench_imm.S && time ./a.out` gives 0.70s, and `cc bench_imm.S -DLOAD_IMM && time ./a.out` gives 0.59s. But for a more complex bit pattern (like 0x400921FB54442D18 for PI) fld.d will win again.
It it possible to limit the use of movgr2fr for the patterns can be loaded with only one or two instructions and see how the SPEC will change?
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D129715/new/
https://reviews.llvm.org/D129715
More information about the llvm-commits
mailing list