[llvm] [SLP]Initial support for non-power-of-2 (but still whole register) number of elements in operands. (PR #106449)

Tue Sep 3 09:13:48 PDT 2024

https://github.com/preames commented:

At a high level, why is this a profitable thing to do?  The structure here assumes to be assuming (via getRegUsageForType) that all fixed vectors types are legalized by splitting to m1.  This is not true.  The actual lowering for your example here uses an m4 load.  Why is it reasonable to cost as if we legal splitting when we're not?

```
define i64 @test(ptr %p) {
; CHECK-LABEL: test:
; CHECK:       # %bb.0:
; CHECK-NEXT:    vsetivli zero, 6, e64, m4, ta, ma
; CHECK-NEXT:    vle64.v v8, (a0)
; CHECK-NEXT:    lui a0, %hi(.LCPI0_0)
; CHECK-NEXT:    addi a0, a0, %lo(.LCPI0_0)
; CHECK-NEXT:    vsetivli zero, 8, e64, m4, ta, ma
; CHECK-NEXT:    vle16.v v12, (a0)
; CHECK-NEXT:    vrgatherei16.vv v16, v8, v12
; CHECK-NEXT:    li a0, 42
; CHECK-NEXT:    vmul.vx v8, v16, a0
; CHECK-NEXT:    vmv.s.x v12, zero
; CHECK-NEXT:    vredsum.vs v8, v8, v12
; CHECK-NEXT:    vmv.x.s a0, v8
; CHECK-NEXT:    ret
  %ld = load <6 x i64>, ptr %p, align 4
  %shuffle = shufflevector <6 x i64> %ld, <6 x i64> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 0, i32 0>
  %mul = mul <8 x i64> %shuffle, <i64 42, i64 42, i64 42, i64 42, i64 42, i64 42, i64 42, i64 42>
  %sum = call i64 @llvm.vector.reduce.add.v8i64(<8 x i64> %mul)
  ret i64 %sum
}
```

Unless you have a clear answer to the above, and I'm just missing something obvious, I think this change should be reverted.  

https://github.com/llvm/llvm-project/pull/106449