[llvm-dev] Enabling IRCE pass or Adding something similar in the pipeline of new pass manager

Thu Apr 29 07:58:22 PDT 2021

Hi All,

I am trying to vectorize some loops. Let 's see a simple loop.

while() {
...
if ()
...
}

As you can see, there is if statement inside loop. As you know, LoopVectorizer tries to predicate the block of if statement with its condition and it asks target machines the cost of the instructions. If there are load or store instructions in the block to be predicated and target machine does not support masked load and store, LoopVectorizer assigns big number to the load and store instructions and it means the loop is not vectorized with loop vectorizer. In this case, it is important to remove the if statement inside loop before LoopVectorizer.

We need to consider two cases to remove the if statement inside loop as below.

1. if statement's condition with loop invariant variables
2. if statement's condition with induction variables

For the first one, UnSwitch/SimpleUnSwitch pass handles the case. The passes hoist the loop invariant condition outside loop and unswitch loop.
For the second one, there could be several transformations but I think the IRCE pass is general solution. The pass splits the iteration space following the condition with induction variable.

At this moment, the only SimpleUnSwitch pass is in pipeline of new pass manager but IRCE is not.

Let's see an IR function to see the impact of IRCE pass.

target triple = "aarch64"

define void @foo(i64 %a, i64* noalias %src, i64* noalias %dst, i64 %n) {
entry:
  br label %bb 

bb:
  %if.cond = icmp slt i64 1, %n
  br i1 %if.cond, label %if.then, label %exit

if.then:
  %if.cond.2 = icmp slt i64 10, %n
  br i1 %if.cond.2, label %loop.ph, label %if.then.else

if.then.else:
  br label %loop.ph

loop.ph:
  br label %loop

loop:
  %iv = phi i64 [ %inc, %for.inc ], [ 1, %loop.ph ]
  %cmp = icmp slt i64 %iv, %a
  br i1 %cmp, label %if.then.2, label %for.inc

if.then.2:
  %src.arrayidx = getelementptr inbounds i64, i64* %src, i64 %iv 
  %val = load i64, i64* %src.arrayidx
  %dst.arrayidx = getelementptr inbounds i64, i64* %dst, i64 %iv 
  store i64 %val, i64* %dst.arrayidx
  br label %for.inc

for.inc:
  %inc = add nuw nsw i64 %iv, 1
  %cond = icmp eq i64 %inc, %n
  br i1 %cond, label %exit, label %loop

exit:
  ret void
}

If we try to vectorize above code, we can see below debug output.

opt  -loop-vectorize -S ./test.ll  -debug-only=loop-vectorize

LV: Scalar loop costs: 5.
...
LV: Found an estimated cost of 3000000 for VF 2 For instruction:   %val = load i64, i64* %src.arrayidx, align 4
LV: Vector loop of width 2 costs: 1500004.
...
LV: Vectorization is possible but not beneficial

If we run IRCE pass ahead of loop vectorizer with SCEV changes https://reviews.llvm.org/D101409 and https://reviews.llvm.org/D100566, we can see the loop is vectorized with below debug output.

opt -irce -irce-skip-profitability-checks -simplifycfg -instcombine -loop-vectorize -S ./test.ll -debug-only=loop-vectorize

LV: Scalar loop costs: 6.
LV: Vector loop of width 2 costs: 2.

In order to vectorize more loops which has conditional branch inside, we need to enable IRCE pass or add something similar ahead of loop vectorizer in the pipeline of new pass manager.

I had a short discussion with Philip and Nikita on https://reviews.llvm.org/D101409 and it looks we need more effort for IRCE pass or something similar features. If you have experience or idea about this, please share it.

Thanks
JinGu Kang