[PATCH] D67572: [VectorUtils] Introduce the Vector Function Database (VFDatabase).

Thu Dec 12 23:28:50 PST 2019

uabelho added a comment.

> @uabelho: would it be possible for you to provide me a minimal reproducer that I could use to craft the (wip) solution I have in mind?

I'm not sure what kind of reproducer you expect since my reproducer requires our out-of-tree target and intrinsics but I can at least try to show something.

What we have in our target is that in initialize() in TargetLibraryInfo.cpp we add our target intrinsics that we allow in vectorized loops.
So we have like:

  const VecDesc VecIntrinsics[] = {
    {"llvm.phx.abs.i32", "", 4}
  };

  TLI.addVectorizableFunctions(VecIntrinsics);

where we say that it's ok to vectorize a loop containing a call to the intrinsic llvm.phx.abs.i32, but we don't provide a vector version that should be used when it's vectorized.

I think in-tree targets did like this before, I'm not sure if they do anymore or if that has changed now.

Then if I run -loop-vectorize on the following input

  define i32 @f() {
    br label %bb1

  bb1:                                              ; preds = %bb1, %0
    %sum = phi i32 [ 0, %0 ], [ %sum_next, %bb1 ]
    %i = phi i16 [ 0, %0 ], [ %i_inc, %bb1 ]
    %call = tail call i32 @llvm.phx.abs.i32(i32 0)
    %sum_next = add i32 %sum, %call
    %i_inc = add nuw nsw i16 %i, 1
    %exit = icmp eq i16 %i_inc, 100
    br i1 %exit, label %bb3, label %bb1

  bb3:                                              ; preds = %bb1
    ret i32 %sum_next
  }

  declare i32 @llvm.phx.abs.i32(i32)

I used to get a vectorized loop like

  vector.body:                                      ; preds = %vector.body, %vector.ph
    %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]
    %vec.phi = phi <4 x i32> [ zeroinitializer, %vector.ph ], [ %10, %vector.body ]
    %offset.idx = trunc i32 %index to i16
    %broadcast.splatinsert = insertelement <4 x i16> undef, i16 %offset.idx, i32 0
    %broadcast.splat = shufflevector <4 x i16> %broadcast.splatinsert, <4 x i16> undef, <4 x i32> zeroinitializer
    %induction = add <4 x i16> %broadcast.splat, <i16 0, i16 1, i16 2, i16 3>
    %1 = add i16 %offset.idx, 0
    %2 = tail call i32 @llvm.phx.abs.i32(i32 0)
    %3 = tail call i32 @llvm.phx.abs.i32(i32 0)
    %4 = tail call i32 @llvm.phx.abs.i32(i32 0)
    %5 = tail call i32 @llvm.phx.abs.i32(i32 0)
    %6 = insertelement <4 x i32> undef, i32 %2, i32 0
    %7 = insertelement <4 x i32> %6, i32 %3, i32 1
    %8 = insertelement <4 x i32> %7, i32 %4, i32 2
    %9 = insertelement <4 x i32> %8, i32 %5, i32 3
    %10 = add <4 x i32> %vec.phi, %9
    %index.next = add i32 %index, 4
    %11 = icmp eq i32 %index.next, 100
    br i1 %11, label %middle.block, label %vector.body, !llvm.loop !0

but with this patch LoopVectorizationLegality bails out with

  LV: Not vectorizing: Found a non-intrinsic callsite   %call = tail call i32 @llvm.phx.abs.i32(i32 0)

Right now I've done a hacky workaround in LoopVectorizationLegality to get the old behavior for our target so we still get vectorization for the above case:

  @@ -704,7 +704,12 @@ bool LoopVectorizationLegality::canVectorizeInstrs() {
         if (CI && !getVectorIntrinsicIDForCall(CI, TLI) &&
             !isa<DbgInfoIntrinsic>(CI) &&
             !(CI->getCalledFunction() && TLI &&
  -            !VFDatabase::getMappings(*CI).empty())) {
  +            (!VFDatabase::getMappings(*CI).empty() ||
  +             // Hack: Allow vectorization even if we didn't provide
  +             // a vector version of the intrinsic.
  +             (CI->getParent()->getModule()->isTargetPhoenix() &&
  +              TLI->isFunctionVectorizable(CI->getCalledFunction()
  +                                          ->getName()))))) {

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67572/new/

https://reviews.llvm.org/D67572