[PATCH] D129715: [LoongArch] Load FP immediates by movgr2fr from materialized integer

Wed Jul 13 21:11:28 PDT 2022

xry111 added a comment.

> To other reviewers: we tested this optimization with an internal llvm version (llvm13) on 3A5000, and it shows that SPEC CPU2006 FP score increases 1% in average. 470.lbm score increases 8.9%. But we wonder why other architectures have not done so? Is there any potential issue?

I guess the reason is "for very simple test cases fld is really faster".

bench.S:

  #define VALUE	0x4090000000000000

  .text
  .type	main, @function
  .globl	main

  main:
  	li.w	$t1, 1048576
  .loop:
  	.rept 1024
  #if LOAD_IMM
  	li.d	$t0, VALUE
  	movgr2fr.d	$ft0, $t0
  #else
  	la.local	$t0, .const0
  	fld.d	$ft0, $t0, 0
  #endif
  	.endr
  	addi.w	$t1, $t1, -1
  	bnez	$t1, .loop
  	li.w	$a0, 0
  	jr	$ra

  .data
  .hidden	.const0
  .const0:
  	.dword	VALUE

On my 3A5000 (at 2.3 GHz) `cc bench_imm.S && time ./a.out` gives 0.35s, but `cc bench_imm.S -DLOAD_IMM && time ./a.out` gives 0.60s.  But I think it's just because for the simple case the constant pool is always in the L1 <https://reviews.llvm.org/L1> cache...

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D129715/new/

https://reviews.llvm.org/D129715