[all-commits] [llvm/llvm-project] 299d71: [RISCV] Lower fixed vectors extract_vector_elt thr...

Mon Sep 11 10:49:36 PDT 2023

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 299d710e3d83e1f1867ed208b1d959603fd2cb41
      https://github.com/llvm/llvm-project/commit/299d710e3d83e1f1867ed208b1d959603fd2cb41
  Author: Philip Reames <preames at rivosinc.com>
  Date:   2023-09-11 (Mon, 11 Sep 2023)

  Changed paths:
    M llvm/lib/Target/RISCV/RISCVISelLowering.cpp
    M llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract.ll
    M llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp2i-sat.ll
    M llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-explodevector.ll

  Log Message:
  -----------
  [RISCV] Lower fixed vectors extract_vector_elt through stack at high LMUL

This is the extract side of D159332. The goal is to avoid non-linear costing on patterns where an entire vector is split back into scalars. This is an idiomatic pattern for SLP.

Each vslide operation is linear in LMUL on common hardware. (For instance, the sifive-x280 cost model models slides this way.) If we do a VL unique extracts, each with a cost linear in LMUL, the overall cost is O(LMUL2) * VLEN/ETYPE. To avoid the degenerate case, fallback to the stack if we're beyond LMUL2.

There's a subtly here. For this to work, we're *relying* on an optimization in LegalizeDAG which tries to reuse the stack slot from a previous extract. In practice, this appear to trigger for patterns within a block, but if we ended up with an explode idiom split across multiple blocks, we'd still be in quadratic territory. I don't think that variant is fixable within SDAG.

It's tempting to think we can do better than going through the stack, but well, I haven't found it yet if it exists. Here's the results for sifive-s280 on all the variants I wrote (all 16 x i64 with V):

output/sifive-x280/linear_decomp_with_slidedown.mca:Total Cycles:      20703
output/sifive-x280/linear_decomp_with_vrgather.mca:Total Cycles:      23903
output/sifive-x280/naive_linear_with_slidedown.mca:Total Cycles:      21604
output/sifive-x280/naive_linear_with_vrgather.mca:Total Cycles:      22804
output/sifive-x280/recursive_decomp_with_slidedown.mca:Total Cycles:      15204
output/sifive-x280/recursive_decomp_with_vrgather.mca:Total Cycles:      18404
output/sifive-x280/stack_by_vreg.mca:Total Cycles:      12104
output/sifive-x280/stack_element_by_element.mca:Total Cycles:      4304

I am deliberately excluding scalable vectors. It functionally works, but frankly, the code quality for an idiomatic explode loop is so terrible either way that it felt better to leave that for future work.

Differential Revision: https://reviews.llvm.org/D159375