[PATCH] D152001: [RISCV][SLP] Inflate insert/extract costs on very small vectors

Fri Jun 2 08:01:21 PDT 2023

reames created this revision.
reames added reviewers: ABataev, craig.topper, l.frisken, asb, kito-cheng.
Herald added subscribers: jobnoorman, vporpo, VincentWu, vkmr, frasercrmck, luismarques, apazos, sameer.abuasal, s.egerton, Jim, benna, psnobl, jocewei, PkmX, the_o, brucehoult, MartinMosbeck, rogfer01, edward-jones, zzheng, jrtc27, shiva0217, niosHD, sabuasal, bollu, simoncook, johnrusso, rbar, hiraditya, arichardson, mcrosier.
Herald added a project: All.
reames requested review of this revision.
Herald added subscribers: pcwang-thead, eopXD, MaskRay.
Herald added a project: LLVM.

This change is a bit of a hack, but I've run out of better ideas.  Basically, I'm adding a fudge factor to the cost of insertelement and extractelement operations at very small VLs.  This has the effect of making vectorization of partially vectorizeable sub-trees appear less profitable to SLP.   The result is that we vectorize significantly fewer small trees when SLP is enabled.

Note that because this penalty *isn't* being applied to loads and stores, we will still vectorize an VL=2 tree (or an VL=2 subtree with a wider root) if the sub-tree is fully vectorizeable.

Here's the impact on namd from spec 2017.

Before:

  vsetivli : 2181 total
  1: 135
  2: 1411
  3: 21
  4: 579
  8: 35
  vsetvli : 735 total
  zero, zero: 687
  reg, zero : 48

After:

  vsetivli : 1360 total
  1: 73
  2: 737
  3: 11
  4: 505
  8: 34
  vsetvli : 257 total
  zero, zero: 191
  reg, zero : 66

This interface isn't solely used by SLP, but it's close.  There's one use in CodeGenPrepare which will cause CGP to be slightly more aggressive about speculating unused lanes, and one use in LV which will bias us away from uniform stores at very small VLs.  Within SLP, this is mostly used to compute build_vector and extract costs.  As the SLP test diff shows, this sometimes results in odd choices - e.g. why did scalarization cost going up result in fewer sub-vector extracts? - but on the whole clearly decreases vectorization at small VLs.

For context, I've been trying to improve the vector codegen at small VLs for the last few weeks (and @luke has been helping), but we don't seem to be making significant progress.  My goal with this patch is basically to side step that work, be able to enable SLP by default, and *then* return to trying to hammer on small VL codegen.

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D152001

Files:
  llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
  llvm/test/Analysis/CostModel/RISCV/arith-fp.ll
  llvm/test/Analysis/CostModel/RISCV/fixed-vector-gather.ll
  llvm/test/Analysis/CostModel/RISCV/fixed-vector-scatter.ll
  llvm/test/Analysis/CostModel/RISCV/fp-min-max-abs.ll
  llvm/test/Analysis/CostModel/RISCV/fp-sqrt-pow.ll
  llvm/test/Analysis/CostModel/RISCV/fp-trig-log-exp.ll
  llvm/test/Analysis/CostModel/RISCV/int-sat-math.ll
  llvm/test/Analysis/CostModel/RISCV/rvv-extractelement.ll
  llvm/test/Analysis/CostModel/RISCV/rvv-insertelement.ll
  llvm/test/Analysis/CostModel/RISCV/shuffle-broadcast.ll
  llvm/test/Transforms/SLPVectorizer/RISCV/math-function.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D152001.527861.patch
Type: text/x-patch
Size: 230518 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20230602/c58746c1/attachment-0001.bin>