[llvm] [SLP]Initial support for non-power-of-2 (but still whole register) number of elements in operands. (PR #106449)

Philip Reames via llvm-commits llvm-commits at lists.llvm.org
Tue Sep 3 09:13:48 PDT 2024


https://github.com/preames commented:

At a high level, why is this a profitable thing to do?  The structure here assumes to be assuming (via getRegUsageForType) that all fixed vectors types are legalized by splitting to m1.  This is not true.  The actual lowering for your example here uses an m4 load.  Why is it reasonable to cost as if we legal splitting when we're not?

```
define i64 @test(ptr %p) {
; CHECK-LABEL: test:
; CHECK:       # %bb.0:
; CHECK-NEXT:    vsetivli zero, 6, e64, m4, ta, ma
; CHECK-NEXT:    vle64.v v8, (a0)
; CHECK-NEXT:    lui a0, %hi(.LCPI0_0)
; CHECK-NEXT:    addi a0, a0, %lo(.LCPI0_0)
; CHECK-NEXT:    vsetivli zero, 8, e64, m4, ta, ma
; CHECK-NEXT:    vle16.v v12, (a0)
; CHECK-NEXT:    vrgatherei16.vv v16, v8, v12
; CHECK-NEXT:    li a0, 42
; CHECK-NEXT:    vmul.vx v8, v16, a0
; CHECK-NEXT:    vmv.s.x v12, zero
; CHECK-NEXT:    vredsum.vs v8, v8, v12
; CHECK-NEXT:    vmv.x.s a0, v8
; CHECK-NEXT:    ret
  %ld = load <6 x i64>, ptr %p, align 4
  %shuffle = shufflevector <6 x i64> %ld, <6 x i64> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 0, i32 0>
  %mul = mul <8 x i64> %shuffle, <i64 42, i64 42, i64 42, i64 42, i64 42, i64 42, i64 42, i64 42>
  %sum = call i64 @llvm.vector.reduce.add.v8i64(<8 x i64> %mul)
  ret i64 %sum
}
```

Unless you have a clear answer to the above, and I'm just missing something obvious, I think this change should be reverted.  

https://github.com/llvm/llvm-project/pull/106449


More information about the llvm-commits mailing list