[PATCH] D129715: [LoongArch] Load FP immediates by movgr2fr from materialized integer

Xi Ruoyao via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Jul 13 21:34:00 PDT 2022


xry111 added a comment.

In D129715#3650672 <https://reviews.llvm.org/D129715#3650672>, @xry111 wrote:

> 



> On my 3A5000 (at 2.3 GHz) `cc bench_imm.S && time ./a.out` gives 0.35s, but `cc bench_imm.S -DLOAD_IMM && time ./a.out` gives 0.60s.  But I think it's just because for the simple case the constant pool is always in the L1 <https://reviews.llvm.org/L1> cache...

Ah, just make the fetch unit busier then the result will prefer immediate loading:

  .loop:
  	.rept 1024
  #if LOAD_IMM
  	li.d	$t0, VALUE
  	movgr2fr.d	$ft0, $t0
  #else
  	la.local	$t0, .const0
  	fld.d	$ft0, $t0, 0
  #endif
  	la.local	$t0, .const1
  	ld.d	$t2, $t0, 0
  	.endr
  	addi.w	$t1, $t1, -1
  	bnez	$t1, .loop
  	li.w	$a0, 0
  	jr	$ra

`cc bench_imm.S && time ./a.out` gives 0.70s, and `cc bench_imm.S -DLOAD_IMM && time ./a.out` gives 0.59s.  But for a more complex bit pattern (like 0x400921FB54442D18 for PI) fld.d will win again.

It it possible to limit the use of movgr2fr for the patterns can be loaded with only one or two instructions and see how the SPEC will change?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D129715/new/

https://reviews.llvm.org/D129715



More information about the llvm-commits mailing list