[llvm] [LoopVectorize] Add cost of generating tail-folding mask to the loop (PR #90191)

John Brawn via llvm-commits llvm-commits at lists.llvm.org
Wed Oct 2 10:01:11 PDT 2024


john-brawn-arm wrote:

> Anyway, for the purpose of testing your issue @john-brawn-arm you can still try out this patch - I've simply commented out the assert that the legacy cost == vplan cost for now.

Looking at store_const_fixed_trip_count in the tail-folding.ll test added by PR #109289, it has estimated cost
```
LV: Found an estimated cost of 0 for VF 1 For instruction:   %iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
LV: Found an estimated cost of 0 for VF 1 For instruction:   %gep = getelementptr i8, ptr %dst, i64 %iv
LV: Found an estimated cost of 2 for VF 1 For instruction:   store i8 1, ptr %gep, align 1
LV: Found an estimated cost of 1 for VF 1 For instruction:   %iv.next = add i64 %iv, 1
LV: Found an estimated cost of 1 for VF 1 For instruction:   %ec = icmp eq i64 %iv.next, 7
LV: Found an estimated cost of 0 for VF 1 For instruction:   br i1 %ec, label %exit, label %loop
LV: Scalar loop costs: 4.
LV: Found an estimated cost of 0 for VF 2 For instruction:   %iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
LV: Found an estimated cost of 0 for VF 2 For instruction:   %gep = getelementptr i8, ptr %dst, i64 %iv
LV: Found an estimated cost of 8 for VF 2 For instruction:   store i8 1, ptr %gep, align 1
LV: Found an estimated cost of 1 for VF 2 For instruction:   %iv.next = add i64 %iv, 1
LV: Found an estimated cost of 1 for VF 2 For instruction:   %ec = icmp eq i64 %iv.next, 7
LV: Found an estimated cost of 0 for VF 2 For instruction:   br i1 %ec, label %exit, label %loop
LV: Vector loop of width 2 costs: 5.
LV: Found an estimated cost of 0 for VF 4 For instruction:   %iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
LV: Found an estimated cost of 0 for VF 4 For instruction:   %gep = getelementptr i8, ptr %dst, i64 %iv
LV: Found an estimated cost of 16 for VF 4 For instruction:   store i8 1, ptr %gep, align 1
LV: Found an estimated cost of 2 for VF 4 For instruction:   %iv.next = add i64 %iv, 1
LV: Found an estimated cost of 1 for VF 4 For instruction:   %ec = icmp eq i64 %iv.next, 7
LV: Found an estimated cost of 0 for VF 4 For instruction:   br i1 %ec, label %exit, label %loop
LV: Vector loop of width 4 costs: 4.
LV: Found an estimated cost of 0 for VF 8 For instruction:   %iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
LV: Found an estimated cost of 0 for VF 8 For instruction:   %gep = getelementptr i8, ptr %dst, i64 %iv
LV: Found an estimated cost of 32 for VF 8 For instruction:   store i8 1, ptr %gep, align 1
LV: Found an estimated cost of 4 for VF 8 For instruction:   %iv.next = add i64 %iv, 1
LV: Found an estimated cost of 1 for VF 8 For instruction:   %ec = icmp eq i64 %iv.next, 7
LV: Found an estimated cost of 0 for VF 8 For instruction:   br i1 %ec, label %exit, label %loop
LV: Vector loop of width 8 costs: 4.
LV: Found an estimated cost of 0 for VF 16 For instruction:   %iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
LV: Found an estimated cost of 0 for VF 16 For instruction:   %gep = getelementptr i8, ptr %dst, i64 %iv
LV: Found an estimated cost of 64 for VF 16 For instruction:   store i8 1, ptr %gep, align 1
LV: Found an estimated cost of 8 for VF 16 For instruction:   %iv.next = add i64 %iv, 1
LV: Found an estimated cost of 1 for VF 16 For instruction:   %ec = icmp eq i64 %iv.next, 7
LV: Found an estimated cost of 0 for VF 16 For instruction:   br i1 %ec, label %exit, label %loop
LV: Vector loop of width 16 costs: 4.
LV: Selecting VF: 1.
```
This PR gives estimated cost
```
LV: Found an estimated cost of 0 for VF 1 For instruction:   %iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
LV: Found an estimated cost of 0 for VF 1 For instruction:   %gep = getelementptr i8, ptr %dst, i64 %iv
LV: Found an estimated cost of 2 for VF 1 For instruction:   store i8 1, ptr %gep, align 1
LV: Found an estimated cost of 1 for VF 1 For instruction:   %iv.next = add i64 %iv, 1
LV: Found an estimated cost of 1 for VF 1 For instruction:   %ec = icmp eq i64 %iv.next, 7
LV: Found an estimated cost of 0 for VF 1 For instruction:   br i1 %ec, label %exit, label %loop
LV: Adding cost of generating tail-fold mask for VF 1: 0
LV: Scalar loop costs: 4.
LV: Found an estimated cost of 0 for VF 2 For instruction:   %iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
LV: Found an estimated cost of 0 for VF 2 For instruction:   %gep = getelementptr i8, ptr %dst, i64 %iv
LV: Found an estimated cost of 2 for VF 2 For instruction:   store i8 1, ptr %gep, align 1
LV: Found an estimated cost of 1 for VF 2 For instruction:   %iv.next = add i64 %iv, 1
LV: Found an estimated cost of 1 for VF 2 For instruction:   %ec = icmp eq i64 %iv.next, 7
LV: Found an estimated cost of 0 for VF 2 For instruction:   br i1 %ec, label %exit, label %loop
LV: Adding cost of generating tail-fold mask for VF 2: 2
LV: Vector loop of width 2 costs: 3.
LV: Found an estimated cost of 0 for VF 4 For instruction:   %iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
LV: Found an estimated cost of 0 for VF 4 For instruction:   %gep = getelementptr i8, ptr %dst, i64 %iv
LV: Found an estimated cost of 4 for VF 4 For instruction:   store i8 1, ptr %gep, align 1
LV: Found an estimated cost of 2 for VF 4 For instruction:   %iv.next = add i64 %iv, 1
LV: Found an estimated cost of 1 for VF 4 For instruction:   %ec = icmp eq i64 %iv.next, 7
LV: Found an estimated cost of 0 for VF 4 For instruction:   br i1 %ec, label %exit, label %loop
LV: Adding cost of generating tail-fold mask for VF 4: 4
LV: Vector loop of width 4 costs: 2.
LV: Found an estimated cost of 0 for VF 8 For instruction:   %iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
LV: Found an estimated cost of 0 for VF 8 For instruction:   %gep = getelementptr i8, ptr %dst, i64 %iv
LV: Found an estimated cost of 8 for VF 8 For instruction:   store i8 1, ptr %gep, align 1
LV: Found an estimated cost of 4 for VF 8 For instruction:   %iv.next = add i64 %iv, 1
LV: Found an estimated cost of 1 for VF 8 For instruction:   %ec = icmp eq i64 %iv.next, 7
LV: Found an estimated cost of 0 for VF 8 For instruction:   br i1 %ec, label %exit, label %loop
LV: Adding cost of generating tail-fold mask for VF 8: 8
LV: Vector loop of width 8 costs: 2.
LV: Found an estimated cost of 0 for VF 16 For instruction:   %iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
LV: Found an estimated cost of 0 for VF 16 For instruction:   %gep = getelementptr i8, ptr %dst, i64 %iv
LV: Found an estimated cost of 16 for VF 16 For instruction:   store i8 1, ptr %gep, align 1
LV: Found an estimated cost of 8 for VF 16 For instruction:   %iv.next = add i64 %iv, 1
LV: Found an estimated cost of 1 for VF 16 For instruction:   %ec = icmp eq i64 %iv.next, 7
LV: Found an estimated cost of 0 for VF 16 For instruction:   br i1 %ec, label %exit, label %loop
LV: Adding cost of generating tail-fold mask for VF 16: 16
LV: Vector loop of width 16 costs: 2.
LV: Selecting VF: 8.
```
Adding the cost of generating the tail-fold mask doesn't counteract the lack of the cost of each scalarized store requiring a branch, so VF 8 is still selected.

https://github.com/llvm/llvm-project/pull/90191


More information about the llvm-commits mailing list