[PATCH] D159332: [RISCV] Cap build vector cost to avoid quadratic cost at high LMULs

Thu Aug 31 20:35:30 PDT 2023

reames created this revision.
reames added reviewers: luke, craig.topper, asb.
Herald added subscribers: jobnoorman, sunshaoce, VincentWu, vkmr, frasercrmck, luismarques, apazos, sameer.abuasal, s.egerton, Jim, benna, psnobl, jocewei, PkmX, the_o, brucehoult, MartinMosbeck, rogfer01, edward-jones, zzheng, jrtc27, shiva0217, kito-cheng, niosHD, sabuasal, bollu, simoncook, johnrusso, rbar, hiraditya, arichardson, mcrosier.
Herald added a project: All.
reames requested review of this revision.
Herald added subscribers: wangpc, eopXD, MaskRay.
Herald added a project: LLVM.

(Still somewhat WIP - posted for feedback, and frankly to grab a phab revision)

Each vslide operation is linear in LMUL on common hardware.  (For instance, the sifive-x280 cost model models slides this way.)  If we do a VL unique inserts, each with a cost linear in LMUL, the overall cost is O(LMUL2) * VLEN/ETYPE.  To avoid the degenerate case, fallback to the stack if the cost is more than a fixed (linear) threshold.

For context, here's the sifive-x280 llvm-mca results for the current lowering and stack based lowering for each LMUL (using e64).  Assumes code was compiled for V (i.e. zvl128b).  
output/sifive-x280/buildvector_m1_via_stack.mca:Total Cycles:      1904
output/sifive-x280/buildvector_m2_via_stack.mca:Total Cycles:      2104
output/sifive-x280/buildvector_m4_via_stack.mca:Total Cycles:      2504
output/sifive-x280/buildvector_m8_via_stack.mca:Total Cycles:      3304
output/sifive-x280/buildvector_m1_via_vslidedown.mca:Total Cycles:      804
output/sifive-x280/buildvector_m2_via_vslidedown.mca:Total Cycles:      1604
output/sifive-x280/buildvector_m4_via_vslide1down.mca:Total Cycles:      6400
output/sifive-x280/buildvector_m8_via_vslide1down.mca:Total Cycles:      25599

There are other schemes we could use to cap the cost.  The next best is recursive decomposition of the vector into smaller LMULs.  That's still quadratic, but with a better constant.  However, stack based seems to cost better on all LMULs, so we can just go with the simpler scheme.

Arguably, this patch is fixing a regression introduced with my D149667 <https://reviews.llvm.org/D149667> as before that change, we'd always fallback to the stack, and thus didn't have the non-linearity.

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D159332

Files:
  llvm/lib/Target/RISCV/RISCVISelLowering.cpp
  llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D159332.555244.patch
Type: text/x-patch
Size: 48922 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20230901/df74f40b/attachment.bin>