[PATCH] D91053: [PowerPC] Lump the constants to save one addis for each constant access
Qing Shan Zhang via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Nov 9 01:00:26 PST 2020
steven.zhang created this revision.
steven.zhang added reviewers: nemanjai, MaskRay, stefanp, jsji, masoud.ataei, PowerPC.
Herald added subscribers: shchenz, kbarton, hiraditya, mgorny.
Herald added a project: LLVM.
steven.zhang requested review of this revision.
For now, we are placing the constant into TOC and whenever it is accessed, we need addis/addi + load. See:
double X(double Y) { return (Y*1.23 + 4.512)*2.34 + 14.38; }
And this is what we have now:
addis 2, 12, .TOC.-.Lfunc_gep0 at ha
addi 2, 2, .TOC.-.Lfunc_gep0 at l
.Lfunc_lep0:
.localentry X, .Lfunc_lep0-.Lfunc_gep0
# %bb.0: # %entry
addis 3, 2, .LCPI0_0 at toc@ha
lfd 0, .LCPI0_0 at toc@l(3) #<-- addi is folding into lfd
addis 3, 2, .LCPI0_1 at toc@ha
xsmuldp 0, 1, 0
lfd 1, .LCPI0_1 at toc@l(3)
addis 3, 2, .LCPI0_2 at toc@ha
xsadddp 0, 0, 1
lfd 1, .LCPI0_2 at toc@l(3)
addis 3, 2, .LCPI0_3 at toc@ha
xsmuldp 0, 0, 1
lfd 1, .LCPI0_3 at toc@l(3)
xsadddp 1, 0, 1
blr
It can be optimized as grouping all the constants together into RO data section, so that their relative positions are fixed. Then, create a symbol in TOC which point to that data section. The benefit for this optimization is to reduce the GOT size and improve the performance as the addis is saved. It works like this:
.section .data.rel.ro,"aw", at progbits
.p2align 3 # -- Begin function X
.LCPI0_0:
.quad 0x402cc28f5c28f5c3 # double 14.380000000000001
.quad 0x4002b851eb851eb8 # double 2.3399999999999999
.quad 0x40120c49ba5e353f # double 4.5119999999999996
.quad 0x3ff3ae147ae147ae # double 1.23
.Lfunc_gep0:
addis 2, 12, .TOC.-.Lfunc_gep0 at ha
addi 2, 2, .TOC.-.Lfunc_gep0 at l
.Lfunc_lep0:
.localentry X, .Lfunc_lep0-.Lfunc_gep0
# %bb.0: # %entry
addis 3, 2, .LC0 at toc@ha
ld 3, .LC0 at toc@l(3)
lfd 0, 24(3)
xsmuldp 0, 1, 0
lfd 1, 16(3)
xsadddp 0, 0, 1
lfd 1, 8(3)
xsmuldp 0, 0, 1
lfdx 1, 0, 3
xsadddp 1, 0, 1
blr
.LC0:
.tc .LCPI0_0[TC],.LCPI0_0
This optimization has been discussed before. See PowerPC/README.txt for more information.
Lump the constant pool for each function into ONE pic object, and reference
pieces of it as offsets from the start. For functions like this (contrived
to have lots of constants obviously):
double X(double Y) { return (Y*1.23 + 4.512)*2.34 + 14.38; }
We generate:
_X:
lis r2, ha16(.CPI_X_0)
lfd f0, lo16(.CPI_X_0)(r2)
lis r2, ha16(.CPI_X_1)
lfd f2, lo16(.CPI_X_1)(r2)
fmadd f0, f1, f0, f2
lis r2, ha16(.CPI_X_2)
lfd f1, lo16(.CPI_X_2)(r2)
lis r2, ha16(.CPI_X_3)
lfd f2, lo16(.CPI_X_3)(r2)
fmadd f1, f0, f1, f2
blr
It would be better to materialize .CPI_X into a register, then use immediates
off of the register to avoid the lis's. This is even more important in PIC
mode.
Note that this (and the static variable version) is discussed here for GCC:
http://gcc.gnu.org/ml/gcc-patches/2006-02/msg00133.html
Here's another example (the sgn function):
double testf(double a) {
return a == 0.0 ? 0.0 : (a > 0.0 ? 1.0 : -1.0);
}
it produces a BB like this:
LBB1_1: ; cond_true
lis r2, ha16(LCPI1_0)
lfs f0, lo16(LCPI1_0)(r2)
lis r2, ha16(LCPI1_1)
lis r3, ha16(LCPI1_2)
lfs f2, lo16(LCPI1_2)(r3)
lfs f3, lo16(LCPI1_1)(r2)
fsub f0, f0, f1
fsel f1, f0, f2, f3
blr
Some limitation:
- If there is only one constant, we will have one extra load with this patch. But the load could be optimized by linker if it merges the TOC. It is not easy insider compiler to handle it as ISEL is done basing on perf BB, and we don't know if there is other constants until other BB are selected. Any thoughts ?
- Lump the constant with the same type. Technical speaking, all the constant could be lumped together as far as the alignment is handle carefully.
Repository:
rG LLVM Github Monorepo
https://reviews.llvm.org/D91053
Files:
llvm/lib/Target/PowerPC/CMakeLists.txt
llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
llvm/lib/Target/PowerPC/PPCConstantPoolValue.cpp
llvm/lib/Target/PowerPC/PPCConstantPoolValue.h
llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
llvm/lib/Target/PowerPC/PPCISelLowering.h
llvm/test/CodeGen/PowerPC/2012-09-16-TOC-entry-check.ll
llvm/test/CodeGen/PowerPC/branch_coalesce.ll
llvm/test/CodeGen/PowerPC/build-vector-allones.ll
llvm/test/CodeGen/PowerPC/build-vector-tests.ll
llvm/test/CodeGen/PowerPC/canonical-merge-shuffles.ll
llvm/test/CodeGen/PowerPC/combine-fneg.ll
llvm/test/CodeGen/PowerPC/constant-pool.ll
llvm/test/CodeGen/PowerPC/extract-and-store.ll
llvm/test/CodeGen/PowerPC/f128-aggregates.ll
llvm/test/CodeGen/PowerPC/f128-passByValue.ll
llvm/test/CodeGen/PowerPC/float-logic-ops.ll
llvm/test/CodeGen/PowerPC/fma-combine.ll
llvm/test/CodeGen/PowerPC/fma-mutate.ll
llvm/test/CodeGen/PowerPC/fmf-propagation.ll
llvm/test/CodeGen/PowerPC/fp-strict-conv-f128.ll
llvm/test/CodeGen/PowerPC/handle-f16-storage-type.ll
llvm/test/CodeGen/PowerPC/load-shuffle-and-shuffle-store.ll
llvm/test/CodeGen/PowerPC/mcm-12.ll
llvm/test/CodeGen/PowerPC/mcm-4.ll
llvm/test/CodeGen/PowerPC/mcm-obj-2.ll
llvm/test/CodeGen/PowerPC/mcm-obj.ll
llvm/test/CodeGen/PowerPC/nofpexcept.ll
llvm/test/CodeGen/PowerPC/p10-splatImm-CPload-pcrel.ll
llvm/test/CodeGen/PowerPC/p9-vinsert-vextract.ll
llvm/test/CodeGen/PowerPC/ppcf128-constrained-fp-intrinsics.ll
llvm/test/CodeGen/PowerPC/ppcf128-endian.ll
llvm/test/CodeGen/PowerPC/pr25080.ll
llvm/test/CodeGen/PowerPC/pr43976.ll
llvm/test/CodeGen/PowerPC/pr45628.ll
llvm/test/CodeGen/PowerPC/pr45709.ll
llvm/test/CodeGen/PowerPC/pr47660.ll
llvm/test/CodeGen/PowerPC/pr47891.ll
llvm/test/CodeGen/PowerPC/pre-inc-disable.ll
llvm/test/CodeGen/PowerPC/recipest.ll
llvm/test/CodeGen/PowerPC/repeated-fp-divisors.ll
llvm/test/CodeGen/PowerPC/sat-add.ll
llvm/test/CodeGen/PowerPC/scalar_cmp.ll
llvm/test/CodeGen/PowerPC/scalar_vector_test_4.ll
llvm/test/CodeGen/PowerPC/select_const.ll
llvm/test/CodeGen/PowerPC/signbit-shift.ll
llvm/test/CodeGen/PowerPC/toc-float.ll
llvm/test/CodeGen/PowerPC/vavg.ll
llvm/test/CodeGen/PowerPC/vec-itofp.ll
llvm/test/CodeGen/PowerPC/vec-trunc.ll
llvm/test/CodeGen/PowerPC/vec-trunc2.ll
llvm/test/CodeGen/PowerPC/vec_add_sub_doubleword.ll
llvm/test/CodeGen/PowerPC/vec_add_sub_quadword.ll
llvm/test/CodeGen/PowerPC/vec_conv_i16_to_fp32_elts.ll
llvm/test/CodeGen/PowerPC/vec_conv_i16_to_fp64_elts.ll
llvm/test/CodeGen/PowerPC/vec_conv_i8_to_fp32_elts.ll
llvm/test/CodeGen/PowerPC/vec_conv_i8_to_fp64_elts.ll
llvm/test/CodeGen/PowerPC/vector-constrained-fp-intrinsics.ll
llvm/test/CodeGen/PowerPC/vector-extend-sign.ll
llvm/test/CodeGen/PowerPC/vector-rotates.ll
llvm/test/CodeGen/PowerPC/vperm-lowering.ll
llvm/test/CodeGen/PowerPC/vselect-constants.ll
llvm/test/CodeGen/PowerPC/vsx.ll
More information about the llvm-commits
mailing list