[llvm-dev] Help: Question on Epilog Vectorization

Wed Oct 6 13:41:03 PDT 2021

The resume value for the widened induction is the only problem I'm aware
of.

The issue is that normally scalar induction resume values are
created/updated as part of skeleton creation. However for widened
inductions in the epilogue loop, we have corresponding recipes in the vplan
that haven't been executed at the time of skeleton creation. We either have
to find the related phis after the fact and fix them up, or change the
vplan to update the incoming values of the widened IVs before executing on
it. Florian demonstrate the latter idea in https://reviews.llvm.org/D92132,
so maybe he has more details to share.

Bardia Mahjour
Compiler Optimizations
IBM Toronto Software Lab

From:	"Venkataramanan Kumar" <venkataramanan.kumar.llvm at gmail.com>
To:	"llvm-dev" <llvm-dev at lists.llvm.org>
Cc:	bmahjour at ca.ibm.com, "Florian Hahn" <florian_hahn at apple.com>
Date:	2021/10/06 12:33 PM
Subject:	[EXTERNAL] Help: Question on Epilog Vectorization

Hi, I wrote a small test case and tried to force epilog vectorization for
the loop. void foo(double * restrict  a,  double * restrict b, int N)
{ for(int i = 0; i < N; ++i)        a[i] = sin(i); } clang -O3 -mavx2
-fveclib=libmvec sin.c ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
ZjQcmQRYFpfptBannerEnd
Hi,

I wrote a small test case and tried to force epilog vectorization for the
loop.

void foo(double * restrict  a,  double * restrict b, int N) {
for(int i = 0; i < N; ++i)
       a[i] = sin(i);
}

clang -O3 -mavx2 -fveclib=libmvec sin.c -mllvm
-epilogue-vectorization-minimum-VF=4 -S  -emit-llvm -fno-unroll-loops

But I ended up with epilog vectorization failing at this check.
In the function "isCandidateForEpilogueVectorization", I find the below
check.

-- Snip llvm/lib/Transforms/Vectorize/LoopVectorize.cpp --

// Induction variables that are widened require special handling that is
// currently not supported.
if (any_of(Legal->getInductionVars(), [&](auto &Entry) {
       return !(this->isScalarAfterVectorization(Entry.first, VF) ||
                  this->isProfitableToScalarize(Entry.first, VF));
-- Snip --

I understand that when induction variables are widened as per the VPLAN ,
we don't support such loops
for epilog vectorization at the moment.

But can someone please explain the "special handling" we need to do here?

If I remove the check from the source, the epilog vectorization is
happening, but the generated LLVM IR seems to be wrong.

---Snip--

12:                                               ; preds = %12, %10
  %13 = phi i64 [ 0, %10 ], [ %19, %12 ]
  %14 = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>, %10 ], [ %20, %12 ]
  %15 = sitofp <4 x i32> %14 to <4 x double>
  %16 = call <4 x double> @_ZGVdN4v_sin(<4 x double> %15)
  %17 = getelementptr inbounds double, double* %0, i64 %13
  %18 = bitcast double* %17 to <4 x double>*
  store <4 x double> %16, <4 x double>* %18, align 8, !tbaa !3
  %19 = add nuw i64 %13, 4
  %20 = add <4 x i32> %14, <i32 4, i32 4, i32 4, i32 4>
  %21 = icmp eq i64 %19, %11
  br i1 %21, label %22, label %12, !llvm.loop !7

22:                                               ; preds = %12
  %23 = icmp eq i64 %11, %6
  br i1 %23, label %44, label %24

24:                                               ; preds = %22
  %25 = and i64 %6, 2
  %26 = icmp eq i64 %25, 0
  br i1 %26, label %42, label %27

27:                                               ; preds = %8, %24
  %28 = phi i64 [ %11, %24 ], [ 0, %8 ]
  %29 = and i64 %6, 4294967294
  br label %30

30:                                               ; preds = %30, %27
  %31 = phi i64 [ %28, %27 ], [ %37, %30 ]
  %32 = phi <2 x i32> [ <i32 0, i32 1>, %27 ], [ %38, %30 ] <== Resume
value seem to be wrong.
  %33 = sitofp <2 x i32> %32 to <2 x double>
  %34 = call <2 x double> @_ZGVbN2v_sin(<2 x double> %33)
  %35 = getelementptr inbounds double, double* %0, i64 %31
  %36 = bitcast double* %35 to <2 x double>*
  store <2 x double> %34, <2 x double>* %36, align 8, !tbaa !3
  %37 = add nuw i64 %31, 2
  %38 = add <2 x i32> %32, <i32 2, i32 2>
  %39 = icmp eq i64 %37, %29
  br i1 %39, label %40, label %30, !llvm.loop !11
--- Snip--

I see the resume value for the widened phi node in the epilog loop is not
updated correctly.
Are there any other issues here apart from handling the widened induction
variable's resume value ?

Regards,
Venkat.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20211006/93305657/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20211006/93305657/attachment.gif>