[PATCH] D129715: [LoongArch] Load FP immediates by movgr2fr from materialized integer

Gong LingQin via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Jul 13 23:24:15 PDT 2022


gonglingqin added a comment.

In D129715#3650728 <https://reviews.llvm.org/D129715#3650728>, @xry111 wrote:

> In D129715#3650672 <https://reviews.llvm.org/D129715#3650672>, @xry111 wrote:
>
>> 
>
>
>
>> On my 3A5000 (at 2.3 GHz) `cc bench_imm.S && time ./a.out` gives 0.35s, but `cc bench_imm.S -DLOAD_IMM && time ./a.out` gives 0.60s.  But I think it's just because for the simple case the constant pool is always in the L1 <https://reviews.llvm.org/L1> cache...
>
> Ah, just make the fetch unit busier then the result will prefer immediate loading:
>
>   .loop:
>   	.rept 1024
>   #if LOAD_IMM
>   	li.d	$t0, VALUE
>   	movgr2fr.d	$ft0, $t0
>   #else
>   	la.local	$t0, .const0
>   	fld.d	$ft0, $t0, 0
>   #endif
>   	la.local	$t0, .const1
>   	ld.d	$t2, $t0, 0
>   	.endr
>   	addi.w	$t1, $t1, -1
>   	bnez	$t1, .loop
>   	li.w	$a0, 0
>   	jr	$ra
>
> `cc bench_imm.S && time ./a.out` gives 0.70s, and `cc bench_imm.S -DLOAD_IMM && time ./a.out` gives 0.59s.  But for a more complex bit pattern (like 0x400921FB54442D18 for PI) fld.d will win again.
>
> It it possible to limit the use of movgr2fr for the patterns can be loaded with only one or two instructions and see how the SPEC will change?

Thanks for the suggestion. It may be possible, I will test it.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D129715/new/

https://reviews.llvm.org/D129715



More information about the llvm-commits mailing list