[llvm] [RISCV][TTI] Reduce cost of a build_vector pattern (PR #108419)
Luke Lau via llvm-commits
llvm-commits at lists.llvm.org
Wed Sep 18 09:54:06 PDT 2024
lukel97 wrote:
> Just to confirm, you're looking at cycle count right? What routine are you seeing this in? I'm looking at an LTO build of povray, and not seeing any heavy use of the @exp routine - except indirectly through a function pointer table. Is your build -Ofast -flto=auto? Or something else?
This build was with -O3 -mcpu=spacemit-x60, I'll queue up another run with -Ofast and LTO.
One example I found was in `pov::compute_backtrace_texture(float*, pov::Texture_Struct*, double*, double*, pov::Ray_Struct*, double, pov::istk_entry*) (_ZN3povL25compute_backtrace_textureEPfPNS_14Texture_StructEPdS3_PNS_10Ray_StructEdPNS_10istk_entryE)`
```diff
- flw fa5, 36(s1)
- fld fs3, %pcrel_lo(.Lpcrel_hi250)(a1)
- fld fs4, 0(s10)
- fcvt.d.s fa5, fa5
- fsub.d fa5, fs3, fa5
- fneg.d fa5, fa5
- fmul.d fa5, fs4, fa5
- fdiv.d fa0, fa5, fs2
- call exp
- flw fa5, 40(s1)
- fmv.d fs1, fa0
- fcvt.d.s fa5, fa5
- fsub.d fa5, fs3, fa5
- fneg.d fa5, fa5
- fmul.d fa5, fs4, fa5
- fdiv.d fa0, fa5, fs2
- call exp
- vsetivli zero, 2, e64, m1, ta, ma
- vfmv.v.f v8, fs1
- flw fa5, 44(s1)
- vfslide1down.vf v8, v8, fa0
- vfmul.vf v8, v8, fs0
- vsetvli zero, zero, e32, mf2, ta, ma
- vfncvt.f.f.w v9, v8
- csrr a0, vlenb
- add a0, a0, sp
- addi a0, a0, 2047
- addi a0, a0, 65
- vs1r.v v9, (a0) # Unknown-size Folded Spill
- fcvt.d.s fa5, fa5
- fsub.d fa5, fs3, fa5
- fneg.d fa5, fa5
- fmul.d fa5, fs4, fa5
- fdiv.d fa0, fa5, fs2
- call exp
- csrr a0, vlenb
- add a0, a0, sp
- addi a0, a0, 2047
- addi a0, a0, 65
- vl1r.v v9, (a0) # Unknown-size Folded Reload
- fmul.d fa5, fa0, fs0
- fcvt.s.d fs0, fa5
- vsetivli zero, 2, e32, mf2, ta, ma
+ flw fa5, 36(s1)
+ fld fs4, %pcrel_lo(.Lpcrel_hi250)(s0)
+ fld fs5, 0(s10)
+ fcvt.d.s fa5, fa5
+ fsub.d fa5, fs4, fa5
+ fneg.d fa5, fa5
+ fmul.d fa5, fs5, fa5
+ fdiv.d fa0, fa5, fs2
+ call exp
+ flw fa5, 40(s1)
+ fmul.d fa4, fa0, fs1
+ fcvt.s.d fs0, fa4
+ fcvt.d.s fa5, fa5
+ fsub.d fa5, fs4, fa5
+ fneg.d fa5, fa5
+ fmul.d fa5, fs5, fa5
+ fdiv.d fa0, fa5, fs2
+ call exp
+ flw fa5, 44(s1)
+ fmul.d fa4, fa0, fs1
+ fcvt.s.d fs3, fa4
+ fcvt.d.s fa5, fa5
+ fsub.d fa5, fs4, fa5
+ fneg.d fa5, fa5
+ fmul.d fa5, fs5, fa5
+ fdiv.d fa0, fa5, fs2
+ call exp
+ fmul.d fa5, fa0, fs1
```
But at the same time, in `pov::do_light(pov::Light_Source_Struct*, double*, pov::Ray_Struct*, pov::Ray_Struct*, double*, float*) (_ZN3povL8do_lightEPNS_19Light_Source_StructEPdPNS_10Ray_StructES4_S2_Pf)` we actually go in the other direction
```diff
- fneg.d fa5, fa5
- fsd fa5, 24(s0)
- fneg.d fa5, fa4
- fsd fa5, 32(s0)
+ vsetivli zero, 2, e64, m1, ta, ma
+ vfmv.v.f v8, fa5
+ vfslide1down.vf v8, v8, fa4
+ vfneg.v v8, v8
+ vse64.v v8, (s5)
```
Unfortunately none of the hot *_Intersection methods seem to be affected, instead it's a large number of cold functions that are slightly perturbed.
I'm really not sure how to interpret these changes. If the rest of the SPEC benchmarks are OK, I would be fine just chalking this up to SLP "noise".
https://github.com/llvm/llvm-project/pull/108419
More information about the llvm-commits
mailing list