[PATCH] D67572: [VectorUtils] Introduce the Vector Function Database (VFDatabase).

Francesco Petrogalli via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Dec 13 13:37:44 PST 2019


fpetrogalli added a comment.

In D67572#1783041 <https://reviews.llvm.org/D67572#1783041>, @uabelho wrote:

> > @uabelho: would it be possible for you to provide me a minimal reproducer that I could use to craft the (wip) solution I have in mind?
>
> I'm not sure what kind of reproducer you expect since my reproducer requires our out-of-tree target and intrinsics but I can at least try to show something.
>
> What we have in our target is that in initialize() in TargetLibraryInfo.cpp we add our target intrinsics that we allow in vectorized loops.
>  So we have like:
>
>   const VecDesc VecIntrinsics[] = {
>     {"llvm.phx.abs.i32", "", 4}
>   };
>   
>   TLI.addVectorizableFunctions(VecIntrinsics);
>   
>
> where we say that it's ok to vectorize a loop containing a call to the intrinsic llvm.phx.abs.i32, but we don't provide a vector version that should be used when it's vectorized.
>
> I think in-tree targets did like this before, I'm not sure if they do anymore or if that has changed now.


I cannot find anything like that in the current `TargetLibraryInfo.cpp`. Having said that, I think that modifying your mappings along the lines of the following example could allow you to use the VFDatabase:

  const VecDesc VecIntrinsics[] = {
    {"llvm.phx.abs.i32", "llvm.phx.abs.i32", 4}
  };
  TLI.addVectorizableFunctions(VecIntrinsics);

With this, the pass `InjectTLIMappings` would add the scalar-to-vector mapping `_ZGV_LLVM_N4v_llvm.phx.abs.i32(llvm.phx.abs.i32)`. This mangled name would tell to vectorize the intrinsic with itself, just using vectors instead of scalars. Admittedly, this than wiill fail in the cost model when scalarizing, because `getVectorizedFunction` will return a function and therefore the code will not be scalarized. As I said, I'd rather have scalarization happen after vectorizaion has completed, but given this might be a change too big we can try a different approach.

A different approach could go as follows.

Thge `_ZGV_LLVM_<mask><vlen><parameters>_<scalarname>{(<vectorname>)}` names have the requirement that the `<vectorname>` redirection cannot be

> Then if I run -loop-vectorize on the following input
> 
>   define i32 @f() {
>     br label %bb1
>   
>   bb1:                                              ; preds = %bb1, %0
>     %sum = phi i32 [ 0, %0 ], [ %sum_next, %bb1 ]
>     %i = phi i16 [ 0, %0 ], [ %i_inc, %bb1 ]
>     %call = tail call i32 @llvm.phx.abs.i32(i32 0)
>     %sum_next = add i32 %sum, %call
>     %i_inc = add nuw nsw i16 %i, 1
>     %exit = icmp eq i16 %i_inc, 100
>     br i1 %exit, label %bb3, label %bb1
>   
>   bb3:                                              ; preds = %bb1
>     ret i32 %sum_next
>   }
>   
>   declare i32 @llvm.phx.abs.i32(i32)
> 
> 
> I used to get a vectorized loop like
> 
>   vector.body:                                      ; preds = %vector.body, %vector.ph
>     %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]
>     %vec.phi = phi <4 x i32> [ zeroinitializer, %vector.ph ], [ %10, %vector.body ]
>     %offset.idx = trunc i32 %index to i16
>     %broadcast.splatinsert = insertelement <4 x i16> undef, i16 %offset.idx, i32 0
>     %broadcast.splat = shufflevector <4 x i16> %broadcast.splatinsert, <4 x i16> undef, <4 x i32> zeroinitializer
>     %induction = add <4 x i16> %broadcast.splat, <i16 0, i16 1, i16 2, i16 3>
>     %1 = add i16 %offset.idx, 0
>     %2 = tail call i32 @llvm.phx.abs.i32(i32 0)
>     %3 = tail call i32 @llvm.phx.abs.i32(i32 0)
>     %4 = tail call i32 @llvm.phx.abs.i32(i32 0)
>     %5 = tail call i32 @llvm.phx.abs.i32(i32 0)
>     %6 = insertelement <4 x i32> undef, i32 %2, i32 0
>     %7 = insertelement <4 x i32> %6, i32 %3, i32 1
>     %8 = insertelement <4 x i32> %7, i32 %4, i32 2
>     %9 = insertelement <4 x i32> %8, i32 %5, i32 3
>     %10 = add <4 x i32> %vec.phi, %9
>     %index.next = add i32 %index, 4
>     %11 = icmp eq i32 %index.next, 100
>     br i1 %11, label %middle.block, label %vector.body, !llvm.loop !0
> 
> 
> but with this patch LoopVectorizationLegality bails out with
> 
>   LV: Not vectorizing: Found a non-intrinsic callsite   %call = tail call i32 @llvm.phx.abs.i32(i32 0)
> 
> 
> Right now I've done a hacky workaround in LoopVectorizationLegality to get the old behavior for our target so we still get vectorization for the above case:
> 
>   @@ -704,7 +704,12 @@ bool LoopVectorizationLegality::canVectorizeInstrs() {
>          if (CI && !getVectorIntrinsicIDForCall(CI, TLI) &&
>              !isa<DbgInfoIntrinsic>(CI) &&
>              !(CI->getCalledFunction() && TLI &&
>   -            !VFDatabase::getMappings(*CI).empty())) {
>   +            (!VFDatabase::getMappings(*CI).empty() ||
>   +             // Hack: Allow vectorization even if we didn't provide
>   +             // a vector version of the intrinsic.
>   +             (CI->getParent()->getModule()->isTargetPhoenix() &&
>   +              TLI->isFunctionVectorizable(CI->getCalledFunction()
>   +                                          ->getName()))))) {

Hi @uabelho and @fhahn ,

I have reverted the change to avoid disruption in your work.

@uabelho, the example you posted here is very useful, I will send you a modified version of the code for review, so that you can verify it works for you.

Kind regards,

Francesco


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67572/new/

https://reviews.llvm.org/D67572





More information about the llvm-commits mailing list