[PATCH] D159375: [RISCV] Lower fixed vectors extract_vector_elt through stack at high LMUL

Fri Sep 1 12:49:35 PDT 2023

reames created this revision.
reames added reviewers: craig.topper, asb, luke.
Herald added subscribers: jobnoorman, sunshaoce, VincentWu, vkmr, frasercrmck, luismarques, apazos, sameer.abuasal, s.egerton, Jim, benna, psnobl, jocewei, PkmX, the_o, brucehoult, MartinMosbeck, rogfer01, edward-jones, zzheng, jrtc27, shiva0217, kito-cheng, niosHD, sabuasal, bollu, simoncook, johnrusso, rbar, hiraditya, arichardson, mcrosier.
Herald added a project: All.
reames requested review of this revision.
Herald added subscribers: wangpc, jplehr, alextsao1999, eopXD, sstefan1, MaskRay.
Herald added a reviewer: jdoerfert.
Herald added a project: LLVM.

This is the extra side of D159332 <https://reviews.llvm.org/D159332>.  The goal is to avoid non-linear costing on patterns where an entire vector is split back into scalars.  This is an idiomatic pattern for SLP.

Each vslide operation is linear in LMUL on common hardware. (For instance, the sifive-x280 cost model models slides this way.) If we do a VL unique extracts, each with a cost linear in LMUL, the overall cost is O(LMUL2) * VLEN/ETYPE. To avoid the degenerate case, fallback to the stack if we're beyond LMUL2.

There's a subtly here.  For this to work, we're *relying* on an optimization in LegalizeDAG which tries to reuse the stack slot from a previous extract.  In practice, this appear to trigger for patterns within a block, but if we ended up with an explode idiom split across multiple blocks, we'd still be in quadratic territory.  I don't think that variant is fixable within SDAG.

It's tempting to think we can do better than going through the stack, but well, I haven't found it yet if it exists.  Here's the results for sifive-s280 on all the variants I wrote (all 16 x i64 with V):

  output/sifive-x280/linear_decomp_with_slidedown.mca:Total Cycles:      20703
  output/sifive-x280/linear_decomp_with_vrgather.mca:Total Cycles:      23903
  output/sifive-x280/naive_linear_with_slidedown.mca:Total Cycles:      21604
  output/sifive-x280/naive_linear_with_vrgather.mca:Total Cycles:      22804
  output/sifive-x280/recursive_decomp_with_slidedown.mca:Total Cycles:      15204
  output/sifive-x280/recursive_decomp_with_vrgather.mca:Total Cycles:      18404
  output/sifive-x280/stack_by_vreg.mca:Total Cycles:      12104
  output/sifive-x280/stack_element_by_element.mca:Total Cycles:      4304

I am deliberately excluding scalable vectors.  It functionally works, but frankly, the code quality for an idiomatic explode loop is so terrible either way that it felt better to leave that for future work.

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D159375

Files:
  llvm/lib/Target/RISCV/RISCVISelLowering.cpp
  llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D159375.555461.patch
Type: text/x-patch
Size: 48922 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20230901/c167d1fb/attachment.bin>