[llvm] [LoopVectorize] Add cost of generating tail-folding mask to the loop (PR #90191)
John Brawn via llvm-commits
llvm-commits at lists.llvm.org
Wed Oct 2 10:01:11 PDT 2024
john-brawn-arm wrote:
> Anyway, for the purpose of testing your issue @john-brawn-arm you can still try out this patch - I've simply commented out the assert that the legacy cost == vplan cost for now.
Looking at store_const_fixed_trip_count in the tail-folding.ll test added by PR #109289, it has estimated cost
```
LV: Found an estimated cost of 0 for VF 1 For instruction: %iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
LV: Found an estimated cost of 0 for VF 1 For instruction: %gep = getelementptr i8, ptr %dst, i64 %iv
LV: Found an estimated cost of 2 for VF 1 For instruction: store i8 1, ptr %gep, align 1
LV: Found an estimated cost of 1 for VF 1 For instruction: %iv.next = add i64 %iv, 1
LV: Found an estimated cost of 1 for VF 1 For instruction: %ec = icmp eq i64 %iv.next, 7
LV: Found an estimated cost of 0 for VF 1 For instruction: br i1 %ec, label %exit, label %loop
LV: Scalar loop costs: 4.
LV: Found an estimated cost of 0 for VF 2 For instruction: %iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
LV: Found an estimated cost of 0 for VF 2 For instruction: %gep = getelementptr i8, ptr %dst, i64 %iv
LV: Found an estimated cost of 8 for VF 2 For instruction: store i8 1, ptr %gep, align 1
LV: Found an estimated cost of 1 for VF 2 For instruction: %iv.next = add i64 %iv, 1
LV: Found an estimated cost of 1 for VF 2 For instruction: %ec = icmp eq i64 %iv.next, 7
LV: Found an estimated cost of 0 for VF 2 For instruction: br i1 %ec, label %exit, label %loop
LV: Vector loop of width 2 costs: 5.
LV: Found an estimated cost of 0 for VF 4 For instruction: %iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
LV: Found an estimated cost of 0 for VF 4 For instruction: %gep = getelementptr i8, ptr %dst, i64 %iv
LV: Found an estimated cost of 16 for VF 4 For instruction: store i8 1, ptr %gep, align 1
LV: Found an estimated cost of 2 for VF 4 For instruction: %iv.next = add i64 %iv, 1
LV: Found an estimated cost of 1 for VF 4 For instruction: %ec = icmp eq i64 %iv.next, 7
LV: Found an estimated cost of 0 for VF 4 For instruction: br i1 %ec, label %exit, label %loop
LV: Vector loop of width 4 costs: 4.
LV: Found an estimated cost of 0 for VF 8 For instruction: %iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
LV: Found an estimated cost of 0 for VF 8 For instruction: %gep = getelementptr i8, ptr %dst, i64 %iv
LV: Found an estimated cost of 32 for VF 8 For instruction: store i8 1, ptr %gep, align 1
LV: Found an estimated cost of 4 for VF 8 For instruction: %iv.next = add i64 %iv, 1
LV: Found an estimated cost of 1 for VF 8 For instruction: %ec = icmp eq i64 %iv.next, 7
LV: Found an estimated cost of 0 for VF 8 For instruction: br i1 %ec, label %exit, label %loop
LV: Vector loop of width 8 costs: 4.
LV: Found an estimated cost of 0 for VF 16 For instruction: %iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
LV: Found an estimated cost of 0 for VF 16 For instruction: %gep = getelementptr i8, ptr %dst, i64 %iv
LV: Found an estimated cost of 64 for VF 16 For instruction: store i8 1, ptr %gep, align 1
LV: Found an estimated cost of 8 for VF 16 For instruction: %iv.next = add i64 %iv, 1
LV: Found an estimated cost of 1 for VF 16 For instruction: %ec = icmp eq i64 %iv.next, 7
LV: Found an estimated cost of 0 for VF 16 For instruction: br i1 %ec, label %exit, label %loop
LV: Vector loop of width 16 costs: 4.
LV: Selecting VF: 1.
```
This PR gives estimated cost
```
LV: Found an estimated cost of 0 for VF 1 For instruction: %iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
LV: Found an estimated cost of 0 for VF 1 For instruction: %gep = getelementptr i8, ptr %dst, i64 %iv
LV: Found an estimated cost of 2 for VF 1 For instruction: store i8 1, ptr %gep, align 1
LV: Found an estimated cost of 1 for VF 1 For instruction: %iv.next = add i64 %iv, 1
LV: Found an estimated cost of 1 for VF 1 For instruction: %ec = icmp eq i64 %iv.next, 7
LV: Found an estimated cost of 0 for VF 1 For instruction: br i1 %ec, label %exit, label %loop
LV: Adding cost of generating tail-fold mask for VF 1: 0
LV: Scalar loop costs: 4.
LV: Found an estimated cost of 0 for VF 2 For instruction: %iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
LV: Found an estimated cost of 0 for VF 2 For instruction: %gep = getelementptr i8, ptr %dst, i64 %iv
LV: Found an estimated cost of 2 for VF 2 For instruction: store i8 1, ptr %gep, align 1
LV: Found an estimated cost of 1 for VF 2 For instruction: %iv.next = add i64 %iv, 1
LV: Found an estimated cost of 1 for VF 2 For instruction: %ec = icmp eq i64 %iv.next, 7
LV: Found an estimated cost of 0 for VF 2 For instruction: br i1 %ec, label %exit, label %loop
LV: Adding cost of generating tail-fold mask for VF 2: 2
LV: Vector loop of width 2 costs: 3.
LV: Found an estimated cost of 0 for VF 4 For instruction: %iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
LV: Found an estimated cost of 0 for VF 4 For instruction: %gep = getelementptr i8, ptr %dst, i64 %iv
LV: Found an estimated cost of 4 for VF 4 For instruction: store i8 1, ptr %gep, align 1
LV: Found an estimated cost of 2 for VF 4 For instruction: %iv.next = add i64 %iv, 1
LV: Found an estimated cost of 1 for VF 4 For instruction: %ec = icmp eq i64 %iv.next, 7
LV: Found an estimated cost of 0 for VF 4 For instruction: br i1 %ec, label %exit, label %loop
LV: Adding cost of generating tail-fold mask for VF 4: 4
LV: Vector loop of width 4 costs: 2.
LV: Found an estimated cost of 0 for VF 8 For instruction: %iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
LV: Found an estimated cost of 0 for VF 8 For instruction: %gep = getelementptr i8, ptr %dst, i64 %iv
LV: Found an estimated cost of 8 for VF 8 For instruction: store i8 1, ptr %gep, align 1
LV: Found an estimated cost of 4 for VF 8 For instruction: %iv.next = add i64 %iv, 1
LV: Found an estimated cost of 1 for VF 8 For instruction: %ec = icmp eq i64 %iv.next, 7
LV: Found an estimated cost of 0 for VF 8 For instruction: br i1 %ec, label %exit, label %loop
LV: Adding cost of generating tail-fold mask for VF 8: 8
LV: Vector loop of width 8 costs: 2.
LV: Found an estimated cost of 0 for VF 16 For instruction: %iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
LV: Found an estimated cost of 0 for VF 16 For instruction: %gep = getelementptr i8, ptr %dst, i64 %iv
LV: Found an estimated cost of 16 for VF 16 For instruction: store i8 1, ptr %gep, align 1
LV: Found an estimated cost of 8 for VF 16 For instruction: %iv.next = add i64 %iv, 1
LV: Found an estimated cost of 1 for VF 16 For instruction: %ec = icmp eq i64 %iv.next, 7
LV: Found an estimated cost of 0 for VF 16 For instruction: br i1 %ec, label %exit, label %loop
LV: Adding cost of generating tail-fold mask for VF 16: 16
LV: Vector loop of width 16 costs: 2.
LV: Selecting VF: 8.
```
Adding the cost of generating the tail-fold mask doesn't counteract the lack of the cost of each scalarized store requiring a branch, so VF 8 is still selected.
https://github.com/llvm/llvm-project/pull/90191
More information about the llvm-commits
mailing list