[PATCH] D106963: [RISCV] Optimize floating-point "dominant value" BUILD_VECTORs

Wed Jul 28 07:41:53 PDT 2021

frasercrmck created this revision.
frasercrmck added reviewers: craig.topper, HsiangKai, rogfer01, evandro, khchen, arcbbb.
Herald added subscribers: vkmr, luismarques, apazos, sameer.abuasal, s.egerton, Jim, benna, psnobl, jocewei, PkmX, the_o, brucehoult, MartinMosbeck, edward-jones, zzheng, jrtc27, shiva0217, kito-cheng, niosHD, sabuasal, simoncook, johnrusso, rbar, asb, hiraditya.
frasercrmck requested review of this revision.
Herald added subscribers: llvm-commits, MaskRay.
Herald added a project: LLVM.

This patch aims to improve the performance of BUILD_VECTORs which are
identified as containing a dominant element. Given that most
floating-point constants themselves require a load from the constant
pool, it was possible for the optimization to actually increase the
number of individual loads on small vectors. The exception is the zero
constant -- +0.0 -- which can be materialized efficiently.

While this optimization could do with a proper cost model to weigh the
benfits of a single vector load vs. the manipulation of individual
elements -- even for integer vectors which often require several
instructions to materialize -- without a concrete RVV implementation to
work with any heuristic is likely to be both more obtuse and inaccurate.

Until then, this patch fixes at least one known obvious deficiency.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D106963

Files:
  llvm/lib/Target/RISCV/RISCVISelLowering.cpp
  llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll


Index: llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll
===================================================================

--- llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll
+++ llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll
@@ -101,8 +101,8 @@
   ret void
 }
 
-; FIXME: We "optimize" this one 2-element load from the constant pool to two
-; loads from the constant pool.
+; We don't want to lower this to the insertion of two scalar elements as above,
+; as each would require their own load from the constant pool.
 
 define void @buildvec_dominant1_v2f32(<2 x float>* %x) {
 ; CHECK-LABEL: buildvec_dominant1_v2f32:
@@ -110,12 +110,7 @@
 ; CHECK-NEXT:    lui a1, %hi(.LCPI3_0)
 ; CHECK-NEXT:    addi a1, a1, %lo(.LCPI3_0)
 ; CHECK-NEXT:    vsetivli zero, 2, e32, mf2, ta, mu
-; CHECK-NEXT:    vlse32.v v25, (a1), zero
-; CHECK-NEXT:    lui a1, %hi(.LCPI3_1)
-; CHECK-NEXT:    flw ft0, %lo(.LCPI3_1)(a1)
-; CHECK-NEXT:    vsetvli zero, zero, e32, mf2, tu, mu
-; CHECK-NEXT:    vfmv.s.f v25, ft0
-; CHECK-NEXT:    vsetvli zero, zero, e32, mf2, ta, mu
+; CHECK-NEXT:    vle32.v v25, (a1)
 ; CHECK-NEXT:    vse32.v v25, (a0)
 ; CHECK-NEXT:    ret
   store <2 x float> <float 1.0, float 2.0>, <2 x float>* %x
Index: llvm/lib/Target/RISCV/RISCVISelLowering.cpp
===================================================================
--- llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -1704,6 +1704,13 @@
   unsigned NumUndefElts =
       count_if(Op->op_values(), [](const SDValue &V) { return V.isUndef(); });
 
+  // Track the number of scalar loads we know we'd be inserting, estimated as
+  // any non-zero floating-point constant. Other kinds of element are either
+  // already in registers or are materialized on demand. The threshold at which
+  // a vector load is more desirable than several scalar materializion and
+  // vector-insertion instructions is not known.
+  unsigned NumScalarLoads = 0;
+
   for (SDValue V : Op->op_values()) {
     if (V.isUndef())
       continue;
@@ -1711,6 +1718,9 @@
     ValueCounts.insert(std::make_pair(V, 0));
     unsigned &Count = ValueCounts[V];
 
+    if (auto *CFP = dyn_cast<ConstantFPSDNode>(V))
+      NumScalarLoads += !CFP->isExactlyValue(+0.0);
+
     // Is this value dominant? In case of a tie, prefer the highest element as
     // it's cheaper to insert near the beginning of a vector than it is at the
     // end.
@@ -1726,7 +1736,7 @@
 
   // Don't perform this optimization when optimizing for size, since
   // materializing elements and inserting them tends to cause code bloat.
-  if (!DAG.shouldOptForSize() &&
+  if (!DAG.shouldOptForSize() && NumScalarLoads < NumElts &&
       ((MostCommonCount > DominantValueCountThreshold) ||
        (ValueCounts.size() <= Log2_32(NumDefElts)))) {
     // Start by splatting the most common element.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D106963.362381.patch
Type: text/x-patch
Size: 2882 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20210728/27a2e008/attachment.bin>