[llvm] [RISCV] Decompose LMUL > 1 reverses into LMUL * M1 vrgather.vv (PR #104574)
Craig Topper via llvm-commits
llvm-commits at lists.llvm.org
Fri Aug 16 08:27:46 PDT 2024
topperc wrote:
> As far as I'm aware, vrgather.vv is quadratic in LMUL on most microarchitectures today due to each output register needing to read from each input register in the group.
SiFive p470 and p670 are quadratic in the worst case, but will skip reading input registers when they aren't used.
Earlier versions of x280 were one element per cycle, but newer generations will improve.
For smaller VLEN and large EEW it's impossible to read all sources at high LMUL. For example, VLEN=128 SEW=64 has only 2 elements per register so can only depend on 2 source registers in the worst case. Is known hardware still quadratic in LMUL for this case?
https://github.com/llvm/llvm-project/pull/104574
More information about the llvm-commits
mailing list