[llvm] [SLP]Initial support for non-power-of-2 (but still whole register) number of elements in operands. (PR #106449)

Alexey Bataev via llvm-commits llvm-commits at lists.llvm.org
Tue Sep 3 09:37:23 PDT 2024

alexey-bataev wrote:

> At a high level, why is this a profitable thing to do? The structure here assumes to be assuming (via getRegUsageForType) that all fixed vectors types are legalized by splitting to m1. This is not true. The actual lowering for your example here uses an m4 load. Why is it reasonable to cost as if we legal splitting when we're not?
> ```
> define i64 @test(ptr %p) {
> ; CHECK-LABEL: test:
> ; CHECK:       # %bb.0:
> ; CHECK-NEXT:    vsetivli zero, 6, e64, m4, ta, ma
> ; CHECK-NEXT:    vle64.v v8, (a0)
> ; CHECK-NEXT:    lui a0, %hi(.LCPI0_0)
> ; CHECK-NEXT:    addi a0, a0, %lo(.LCPI0_0)
> ; CHECK-NEXT:    vsetivli zero, 8, e64, m4, ta, ma
> ; CHECK-NEXT:    vle16.v v12, (a0)
> ; CHECK-NEXT:    vrgatherei16.vv v16, v8, v12
> ; CHECK-NEXT:    li a0, 42
> ; CHECK-NEXT:    vmul.vx v8, v16, a0
> ; CHECK-NEXT:    vmv.s.x v12, zero
> ; CHECK-NEXT:    vredsum.vs v8, v8, v12
> ; CHECK-NEXT:    vmv.x.s a0, v8
> ; CHECK-NEXT:    ret
>   %ld = load <6 x i64>, ptr %p, align 4
>   %shuffle = shufflevector <6 x i64> %ld, <6 x i64> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 0, i32 0>
>   %mul = mul <8 x i64> %shuffle, <i64 42, i64 42, i64 42, i64 42, i64 42, i64 42, i64 42, i64 42>
>   %sum = call i64 @llvm.vector.reduce.add.v8i64(<8 x i64> %mul)
>   ret i64 %sum
> }
> ```
> Unless you have a clear answer to the above, and I'm just missing something obvious, I think this change should be reverted.

Looks like it requires to use a new TTI entry instead of this one, since it does not work as expected for RISCV. I will add it.


