[PATCH] D129715: [LoongArch] Load FP immediates by movgr2fr from materialized integer
Xi Ruoyao via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Jul 13 21:11:28 PDT 2022
xry111 added a comment.
> To other reviewers: we tested this optimization with an internal llvm version (llvm13) on 3A5000, and it shows that SPEC CPU2006 FP score increases 1% in average. 470.lbm score increases 8.9%. But we wonder why other architectures have not done so? Is there any potential issue?
I guess the reason is "for very simple test cases fld is really faster".
bench.S:
#define VALUE 0x4090000000000000
.text
.type main, @function
.globl main
main:
li.w $t1, 1048576
.loop:
.rept 1024
#if LOAD_IMM
li.d $t0, VALUE
movgr2fr.d $ft0, $t0
#else
la.local $t0, .const0
fld.d $ft0, $t0, 0
#endif
.endr
addi.w $t1, $t1, -1
bnez $t1, .loop
li.w $a0, 0
jr $ra
.data
.hidden .const0
.const0:
.dword VALUE
On my 3A5000 (at 2.3 GHz) `cc bench_imm.S && time ./a.out` gives 0.35s, but `cc bench_imm.S -DLOAD_IMM && time ./a.out` gives 0.60s. But I think it's just because for the simple case the constant pool is always in the L1 <https://reviews.llvm.org/L1> cache...
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D129715/new/
https://reviews.llvm.org/D129715
More information about the llvm-commits
mailing list