[llvm-dev] Help: Question on Epilog Vectorization

Wed Oct 6 09:32:44 PDT 2021

Hi,

I wrote a small test case and tried to force epilog vectorization for the
loop.

void foo(double * restrict  a,  double * restrict b, int N) {
for(int i = 0; i < N; ++i)
       a[i] = sin(i);
}

clang -O3 -mavx2 -fveclib=libmvec sin.c -mllvm
-epilogue-vectorization-minimum-VF=4 -S  -emit-llvm -fno-unroll-loops

But I ended up with epilog vectorization failing at this check.
In the function "isCandidateForEpilogueVectorization", I find the below
check.

-- Snip llvm/lib/Transforms/Vectorize/LoopVectorize.cpp --

// Induction variables that are widened require special handling that is
// currently not supported.
if (any_of(Legal->getInductionVars(), [&](auto &Entry) {
       return !(this->isScalarAfterVectorization(Entry.first, VF) ||
                  this->isProfitableToScalarize(Entry.first, VF));
-- Snip --

I understand that when induction variables are widened as per the VPLAN ,
we don't support such loops
for epilog vectorization at the moment.

But can someone please explain the "special handling" we need to do here?

If I remove the check from the source, the epilog vectorization is
happening, but the generated LLVM IR seems to be wrong.

---Snip--

12:                                               ; preds = %12, %10
  %13 = phi i64 [ 0, %10 ], [ %19, %12 ]
  %14 = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>, %10 ], [ %20, %12 ]
  %15 = sitofp <4 x i32> %14 to <4 x double>
  %16 = call <4 x double> @_ZGVdN4v_sin(<4 x double> %15)
  %17 = getelementptr inbounds double, double* %0, i64 %13
  %18 = bitcast double* %17 to <4 x double>*
  store <4 x double> %16, <4 x double>* %18, align 8, !tbaa !3
  %19 = add nuw i64 %13, 4
  %20 = add <4 x i32> %14, <i32 4, i32 4, i32 4, i32 4>
  %21 = icmp eq i64 %19, %11
  br i1 %21, label %22, label %12, !llvm.loop !7

22:                                               ; preds = %12
  %23 = icmp eq i64 %11, %6
  br i1 %23, label %44, label %24

24:                                               ; preds = %22
  %25 = and i64 %6, 2
  %26 = icmp eq i64 %25, 0
  br i1 %26, label %42, label %27

27:                                               ; preds = %8, %24
  %28 = phi i64 [ %11, %24 ], [ 0, %8 ]
  %29 = and i64 %6, 4294967294
  br label %30

30:                                               ; preds = %30, %27
  %31 = phi i64 [ %28, %27 ], [ %37, %30 ]
  %32 = phi <2 x i32> [ <i32 0, i32 1>, %27 ], [ %38, %30 ] <== Resume
value seem to be wrong.
  %33 = sitofp <2 x i32> %32 to <2 x double>
  %34 = call <2 x double> @_ZGVbN2v_sin(<2 x double> %33)
  %35 = getelementptr inbounds double, double* %0, i64 %31
  %36 = bitcast double* %35 to <2 x double>*
  store <2 x double> %34, <2 x double>* %36, align 8, !tbaa !3
  %37 = add nuw i64 %31, 2
  %38 = add <2 x i32> %32, <i32 2, i32 2>
  %39 = icmp eq i64 %37, %29
  br i1 %39, label %40, label %30, !llvm.loop !11
--- Snip--

I see the resume value for the widened phi node in the epilog loop is not
updated correctly.
Are there any other issues here apart from handling the widened induction
variable's resume value ?

Regards,
Venkat.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20211006/f25c30d7/attachment.html>