[PATCH] D67572: [VectorUtils] Introduce the Vector Function Database (VFDatabase).
Francesco Petrogalli via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Dec 13 13:37:44 PST 2019
fpetrogalli added a comment.
In D67572#1783041 <https://reviews.llvm.org/D67572#1783041>, @uabelho wrote:
> > @uabelho: would it be possible for you to provide me a minimal reproducer that I could use to craft the (wip) solution I have in mind?
>
> I'm not sure what kind of reproducer you expect since my reproducer requires our out-of-tree target and intrinsics but I can at least try to show something.
>
> What we have in our target is that in initialize() in TargetLibraryInfo.cpp we add our target intrinsics that we allow in vectorized loops.
> So we have like:
>
> const VecDesc VecIntrinsics[] = {
> {"llvm.phx.abs.i32", "", 4}
> };
>
> TLI.addVectorizableFunctions(VecIntrinsics);
>
>
> where we say that it's ok to vectorize a loop containing a call to the intrinsic llvm.phx.abs.i32, but we don't provide a vector version that should be used when it's vectorized.
>
> I think in-tree targets did like this before, I'm not sure if they do anymore or if that has changed now.
I cannot find anything like that in the current `TargetLibraryInfo.cpp`. Having said that, I think that modifying your mappings along the lines of the following example could allow you to use the VFDatabase:
const VecDesc VecIntrinsics[] = {
{"llvm.phx.abs.i32", "llvm.phx.abs.i32", 4}
};
TLI.addVectorizableFunctions(VecIntrinsics);
With this, the pass `InjectTLIMappings` would add the scalar-to-vector mapping `_ZGV_LLVM_N4v_llvm.phx.abs.i32(llvm.phx.abs.i32)`. This mangled name would tell to vectorize the intrinsic with itself, just using vectors instead of scalars. Admittedly, this than wiill fail in the cost model when scalarizing, because `getVectorizedFunction` will return a function and therefore the code will not be scalarized. As I said, I'd rather have scalarization happen after vectorizaion has completed, but given this might be a change too big we can try a different approach.
A different approach could go as follows.
Thge `_ZGV_LLVM_<mask><vlen><parameters>_<scalarname>{(<vectorname>)}` names have the requirement that the `<vectorname>` redirection cannot be
> Then if I run -loop-vectorize on the following input
>
> define i32 @f() {
> br label %bb1
>
> bb1: ; preds = %bb1, %0
> %sum = phi i32 [ 0, %0 ], [ %sum_next, %bb1 ]
> %i = phi i16 [ 0, %0 ], [ %i_inc, %bb1 ]
> %call = tail call i32 @llvm.phx.abs.i32(i32 0)
> %sum_next = add i32 %sum, %call
> %i_inc = add nuw nsw i16 %i, 1
> %exit = icmp eq i16 %i_inc, 100
> br i1 %exit, label %bb3, label %bb1
>
> bb3: ; preds = %bb1
> ret i32 %sum_next
> }
>
> declare i32 @llvm.phx.abs.i32(i32)
>
>
> I used to get a vectorized loop like
>
> vector.body: ; preds = %vector.body, %vector.ph
> %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]
> %vec.phi = phi <4 x i32> [ zeroinitializer, %vector.ph ], [ %10, %vector.body ]
> %offset.idx = trunc i32 %index to i16
> %broadcast.splatinsert = insertelement <4 x i16> undef, i16 %offset.idx, i32 0
> %broadcast.splat = shufflevector <4 x i16> %broadcast.splatinsert, <4 x i16> undef, <4 x i32> zeroinitializer
> %induction = add <4 x i16> %broadcast.splat, <i16 0, i16 1, i16 2, i16 3>
> %1 = add i16 %offset.idx, 0
> %2 = tail call i32 @llvm.phx.abs.i32(i32 0)
> %3 = tail call i32 @llvm.phx.abs.i32(i32 0)
> %4 = tail call i32 @llvm.phx.abs.i32(i32 0)
> %5 = tail call i32 @llvm.phx.abs.i32(i32 0)
> %6 = insertelement <4 x i32> undef, i32 %2, i32 0
> %7 = insertelement <4 x i32> %6, i32 %3, i32 1
> %8 = insertelement <4 x i32> %7, i32 %4, i32 2
> %9 = insertelement <4 x i32> %8, i32 %5, i32 3
> %10 = add <4 x i32> %vec.phi, %9
> %index.next = add i32 %index, 4
> %11 = icmp eq i32 %index.next, 100
> br i1 %11, label %middle.block, label %vector.body, !llvm.loop !0
>
>
> but with this patch LoopVectorizationLegality bails out with
>
> LV: Not vectorizing: Found a non-intrinsic callsite %call = tail call i32 @llvm.phx.abs.i32(i32 0)
>
>
> Right now I've done a hacky workaround in LoopVectorizationLegality to get the old behavior for our target so we still get vectorization for the above case:
>
> @@ -704,7 +704,12 @@ bool LoopVectorizationLegality::canVectorizeInstrs() {
> if (CI && !getVectorIntrinsicIDForCall(CI, TLI) &&
> !isa<DbgInfoIntrinsic>(CI) &&
> !(CI->getCalledFunction() && TLI &&
> - !VFDatabase::getMappings(*CI).empty())) {
> + (!VFDatabase::getMappings(*CI).empty() ||
> + // Hack: Allow vectorization even if we didn't provide
> + // a vector version of the intrinsic.
> + (CI->getParent()->getModule()->isTargetPhoenix() &&
> + TLI->isFunctionVectorizable(CI->getCalledFunction()
> + ->getName()))))) {
Hi @uabelho and @fhahn ,
I have reverted the change to avoid disruption in your work.
@uabelho, the example you posted here is very useful, I will send you a modified version of the code for review, so that you can verify it works for you.
Kind regards,
Francesco
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D67572/new/
https://reviews.llvm.org/D67572
More information about the llvm-commits
mailing list