[llvm] [LoopVectorize] Don't discount instructions scalarized due to tail folding (PR #109289)

David Sherwood via llvm-commits llvm-commits at lists.llvm.org
Fri Sep 27 08:38:07 PDT 2024


================
@@ -21,7 +21,31 @@ define void @foo(ptr noalias %a, ptr noalias %b, ptr noalias %c, i64 %N) {
 ; CHECK-NEXT:   vector.body:
 ; CHECK-NEXT:     EMIT vp<[[CAN_IV:%.+]]> = CANONICAL-INDUCTION ir<0>, vp<[[CAN_INC:%.*]]>
 ; CHECK-NEXT:     WIDEN-INDUCTION %iv = phi 0, %iv.next, ir<1>, vp<[[VF]]>
+; CHECK-NEXT:     vp<[[STEPS:%.+]]> = SCALAR-STEPS vp<[[CAN_IV]]>, ir<1>
 ; CHECK-NEXT:     EMIT vp<[[CMP:%.+]]> = icmp ule ir<%iv>, vp<[[BTC]]>
+; CHECK-NEXT:   Successor(s): pred.load
----------------
david-arm wrote:

At first glance this looks worse than before, unless I'm missing something. It looks like previously we were reusing the same predicated blocks to perform both the load and store, i.e. something like

```
  %cmp = icmp ... <4 x i32>
  %lane0 = extractelement <4 x i32> %cmp, i32 0
  br i1 %lane0, label %block1.if, label %block1.continue

%block1.if:
  .. do load ..
  .. do store ..
...
```

whereas now we've essentially split out the loads and stores with duplicate control flow, i.e.

```
  %cmp = icmp ... <4 x i32>
  %lane0 = extractelement <4 x i32> %cmp, i32 0
  br i1 %lane0, label %block1.load.if, label %block1.load.continue

%block1.if:
  .. do load ..
...

%stores:
  %lane0.1 = extractelement <4 x i32> %cmp, i32 0
  br i1 %lane0.1, label %block1.store.if, label %block1.store.continue
...
```

I'd expect the extra control flow to hurt performance.

https://github.com/llvm/llvm-project/pull/109289


More information about the llvm-commits mailing list