[PATCH] D129715: [LoongArch] Load FP immediates by movgr2fr from materialized integer

Wed Jul 13 23:24:15 PDT 2022

gonglingqin added a comment.

In D129715#3650728 <https://reviews.llvm.org/D129715#3650728>, @xry111 wrote:

> In D129715#3650672 <https://reviews.llvm.org/D129715#3650672>, @xry111 wrote:
>
>> 
>
>
>
>> On my 3A5000 (at 2.3 GHz) `cc bench_imm.S && time ./a.out` gives 0.35s, but `cc bench_imm.S -DLOAD_IMM && time ./a.out` gives 0.60s.  But I think it's just because for the simple case the constant pool is always in the L1 <https://reviews.llvm.org/L1> cache...
>
> Ah, just make the fetch unit busier then the result will prefer immediate loading:
>
>   .loop:
>   	.rept 1024
>   #if LOAD_IMM
>   	li.d	$t0, VALUE
>   	movgr2fr.d	$ft0, $t0
>   #else
>   	la.local	$t0, .const0
>   	fld.d	$ft0, $t0, 0
>   #endif
>   	la.local	$t0, .const1
>   	ld.d	$t2, $t0, 0
>   	.endr
>   	addi.w	$t1, $t1, -1
>   	bnez	$t1, .loop
>   	li.w	$a0, 0
>   	jr	$ra
>
> `cc bench_imm.S && time ./a.out` gives 0.70s, and `cc bench_imm.S -DLOAD_IMM && time ./a.out` gives 0.59s.  But for a more complex bit pattern (like 0x400921FB54442D18 for PI) fld.d will win again.
>
> It it possible to limit the use of movgr2fr for the patterns can be loaded with only one or two instructions and see how the SPEC will change?

Thanks for the suggestion. It may be possible, I will test it.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D129715/new/

https://reviews.llvm.org/D129715