[llvm] [RISCV] Fold vector shift of sext/zext to widening multiply (PR #121563)

Tue Jan 28 11:06:44 PST 2025

topperc wrote:

> > I've heard that this transform won't be profitable on other CPUs, so I added a commit that enables it on BPI's SpacemiT X60 only. Basing on https://camel-cdr.github.io/rvv-bench-results/canmv_k230/ perhaps K230 too, but I don't have one to confirm.
> > The sole presence of Zvbb doesn't preclude this transform, because `vwsll.vi` only does zero-extension, while widening multiply comes in the sign-extending variant as well.
> 
> It's probably profitable on SiFive x280. The vzext/vsext can produce DLEN bits per cycle where DLEN=VLEN/2. The latency until the first DLEN is ready is 4. The shift also produces DLEN bits per cycle. The latency until the first DLEN is ready is 8. The widening multiply produces DLEN*2 bits per cycle. The first 2 DLENs complete in 8 cycles.

I might have the latency wrong shifts on x280. The scheduler model says 4, but I have other docs internally that say 8. I'm confirming.

https://github.com/llvm/llvm-project/pull/121563