[PATCH] D129715: [LoongArch] Load FP immediates by movgr2fr from materialized integer
Gong LingQin via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Jul 13 23:24:15 PDT 2022
gonglingqin added a comment.
In D129715#3650728 <https://reviews.llvm.org/D129715#3650728>, @xry111 wrote:
> In D129715#3650672 <https://reviews.llvm.org/D129715#3650672>, @xry111 wrote:
>
>>
>
>
>
>> On my 3A5000 (at 2.3 GHz) `cc bench_imm.S && time ./a.out` gives 0.35s, but `cc bench_imm.S -DLOAD_IMM && time ./a.out` gives 0.60s. But I think it's just because for the simple case the constant pool is always in the L1 <https://reviews.llvm.org/L1> cache...
>
> Ah, just make the fetch unit busier then the result will prefer immediate loading:
>
> .loop:
> .rept 1024
> #if LOAD_IMM
> li.d $t0, VALUE
> movgr2fr.d $ft0, $t0
> #else
> la.local $t0, .const0
> fld.d $ft0, $t0, 0
> #endif
> la.local $t0, .const1
> ld.d $t2, $t0, 0
> .endr
> addi.w $t1, $t1, -1
> bnez $t1, .loop
> li.w $a0, 0
> jr $ra
>
> `cc bench_imm.S && time ./a.out` gives 0.70s, and `cc bench_imm.S -DLOAD_IMM && time ./a.out` gives 0.59s. But for a more complex bit pattern (like 0x400921FB54442D18 for PI) fld.d will win again.
>
> It it possible to limit the use of movgr2fr for the patterns can be loaded with only one or two instructions and see how the SPEC will change?
Thanks for the suggestion. It may be possible, I will test it.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D129715/new/
https://reviews.llvm.org/D129715
More information about the llvm-commits
mailing list