[llvm] r200576 - [SLPV] Recognize vectorizable intrinsics during SLP vectorization and

Mon Feb 3 08:51:38 PST 2014

Thank you Reid. Have you reverted the change?

I think the alternatives are to disable vectorization of bswap on 32-bit or
teach llvm how to do the expansion.

I'm using the same code as the Loop vectorizer to decide what to vectorize,
which makes me wonder if the same issue could be hit with loop
vectorization. I'll investigate a bit further and figure out what to do
next.

On Fri, Jan 31, 2014 at 5:40 PM, Reid Kleckner <rnk at google.com> wrote:

> Hey Raul,
>
> This patch broke the 32-bit self-host build.  The x86 backend claims that
> the legalizer knows how to expand bswap intrinsics on vector types, but
> this is not the case.  Running llc the test case I gave produces:
> Unhandled Expand type in BSWAP!
> UNREACHABLE executed at ..\lib\CodeGen\SelectionDAG\LegalizeDAG.cpp:2537!
>
> I'm going to revert this for now, but what should happen is that we should
> learn how to expand bswap on vectors.
>
> Reid
>
>
>
> On Fri, Jan 31, 2014 at 5:17 PM, Reid Kleckner <rnk at google.com> wrote:
>
>> This change caused us to vectorize bswap, which we then try to expand on
>> i686, which hits an unreachable.  Running llc on this repros:
>>
>> declare <2 x i64> @llvm.bswap.v2i64(<2 x i64>) #8
>> define <2 x i64> @foo(<2 x i64> %v) {
>>   %s = call <2 x i64> @llvm.bswap.v2i64(<2 x i64> %v)
>>   ret <2 x i64> %s
>> }
>> attributes #8 = { nounwind readnone }
>>
>> I don't have a reduced test case of input to the SLP vectorizer yet
>> because I'm new to reducing LLVM IR.
>>
>>
>> On Fri, Jan 31, 2014 at 1:14 PM, Chandler Carruth <chandlerc at gmail.com>wrote:
>>
>>> Author: chandlerc
>>> Date: Fri Jan 31 15:14:40 2014
>>> New Revision: 200576
>>>
>>> URL: http://llvm.org/viewvc/llvm-project?rev=200576&view=rev
>>> Log:
>>> [SLPV] Recognize vectorizable intrinsics during SLP vectorization and
>>> transform accordingly. Based on similar code from Loop vectorization.
>>> Subsequent commits will include vectorization of function calls to
>>> vector intrinsics and form function calls to vector library calls.
>>>
>>> Patch by Raul Silvera! (Much delayed due to my not running dcommit)
>>>
>>> Added:
>>>     llvm/trunk/test/Transforms/SLPVectorizer/X86/intrinsic.ll
>>> Modified:
>>>     llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp
>>>
>>> Modified: llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp?rev=200576&r1=200575&r2=200576&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp (original)
>>> +++ llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp Fri Jan 31
>>> 15:14:40 2014
>>> @@ -947,6 +947,39 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val
>>>        buildTree_rec(Operands, Depth + 1);
>>>        return;
>>>      }
>>> +    case Instruction::Call: {
>>> +      // Check if the calls are all to the same vectorizable intrinsic.
>>> +      IntrinsicInst *II = dyn_cast<IntrinsicInst>(VL[0]);
>>> +      if (II==NULL) {
>>> +        newTreeEntry(VL, false);
>>> +        DEBUG(dbgs() << "SLP: Non-vectorizable call.\n");
>>> +        return;
>>> +      }
>>> +
>>> +      Intrinsic::ID ID = II->getIntrinsicID();
>>> +
>>> +      for (unsigned i = 1, e = VL.size(); i != e; ++i) {
>>> +        IntrinsicInst *II2 = dyn_cast<IntrinsicInst>(VL[i]);
>>> +        if (!II2 || II2->getIntrinsicID() != ID) {
>>> +          newTreeEntry(VL, false);
>>> +          DEBUG(dbgs() << "SLP: mismatched calls:" << *II << "!=" <<
>>> *VL[i]
>>> +                       << "\n");
>>> +          return;
>>> +        }
>>> +      }
>>> +
>>> +      newTreeEntry(VL, true);
>>> +      for (unsigned i = 0, e = II->getNumArgOperands(); i != e; ++i) {
>>> +        ValueList Operands;
>>> +        // Prepare the operand vector.
>>> +        for (unsigned j = 0; j < VL.size(); ++j) {
>>> +          IntrinsicInst *II2 = dyn_cast<IntrinsicInst>(VL[j]);
>>> +          Operands.push_back(II2->getArgOperand(i));
>>> +        }
>>> +        buildTree_rec(Operands, Depth + 1);
>>> +      }
>>> +      return;
>>> +    }
>>>      default:
>>>        newTreeEntry(VL, false);
>>>        DEBUG(dbgs() << "SLP: Gathering unknown instruction.\n");
>>> @@ -1072,6 +1105,30 @@ int BoUpSLP::getEntryCost(TreeEntry *E)
>>>        int VecStCost = TTI->getMemoryOpCost(Instruction::Store, VecTy,
>>> 1, 0);
>>>        return VecStCost - ScalarStCost;
>>>      }
>>> +    case Instruction::Call: {
>>> +      CallInst *CI = cast<CallInst>(VL0);
>>> +      IntrinsicInst *II = cast<IntrinsicInst>(CI);
>>> +      Intrinsic::ID ID = II->getIntrinsicID();
>>> +
>>> +      // Calculate the cost of the scalar and vector calls.
>>> +      SmallVector<Type*, 4> ScalarTys, VecTys;
>>> +      for (unsigned op = 0, opc = II->getNumArgOperands(); op!= opc;
>>> ++op) {
>>> +        ScalarTys.push_back(CI->getArgOperand(op)->getType());
>>> +
>>>  VecTys.push_back(VectorType::get(CI->getArgOperand(op)->getType(),
>>> +                                         VecTy->getNumElements()));
>>> +      }
>>> +
>>> +      int ScalarCallCost = VecTy->getNumElements() *
>>> +          TTI->getIntrinsicInstrCost(ID, ScalarTy, ScalarTys);
>>> +
>>> +      int VecCallCost = TTI->getIntrinsicInstrCost(ID, VecTy, VecTys);
>>> +
>>> +      DEBUG(dbgs() << "SLP: Call cost "<< VecCallCost - ScalarCallCost
>>> +            << " (" << VecCallCost  << "-" <<  ScalarCallCost << ")"
>>> +            << " for " << *II << "\n");
>>> +
>>> +      return VecCallCost - ScalarCallCost;
>>> +    }
>>>      default:
>>>        llvm_unreachable("Unknown instruction");
>>>    }
>>> @@ -1086,10 +1143,10 @@ bool BoUpSLP::isFullyVectorizableTinyTre
>>>      return false;
>>>
>>>    // Gathering cost would be too much for tiny trees.
>>> -  if (VectorizableTree[0].NeedToGather ||
>>> VectorizableTree[1].NeedToGather)
>>> -    return false;
>>> +  if (VectorizableTree[0].NeedToGather ||
>>> VectorizableTree[1].NeedToGather)
>>> +    return false;
>>>
>>> -  return true;
>>> +  return true;
>>>  }
>>>
>>>  int BoUpSLP::getTreeCost() {
>>> @@ -1555,6 +1612,32 @@ Value *BoUpSLP::vectorizeTree(TreeEntry
>>>        E->VectorizedValue = S;
>>>        return propagateMetadata(S, E->Scalars);
>>>      }
>>> +    case Instruction::Call: {
>>> +      CallInst *CI = cast<CallInst>(VL0);
>>> +
>>> +      setInsertPointAfterBundle(E->Scalars);
>>> +      std::vector<Value *> OpVecs;
>>> +      for (int j = 0, e = CI->getNumArgOperands(); j < e; ++j) {
>>> +        ValueList OpVL;
>>> +        for (int i = 0, e = E->Scalars.size(); i < e; ++i) {
>>> +          CallInst *CEI = cast<CallInst>(E->Scalars[i]);
>>> +          OpVL.push_back(CEI->getArgOperand(j));
>>> +        }
>>> +
>>> +        Value *OpVec = vectorizeTree(OpVL);
>>> +        DEBUG(dbgs() << "SLP: OpVec[" << j << "]: " << *OpVec << "\n");
>>> +        OpVecs.push_back(OpVec);
>>> +      }
>>> +
>>> +      Module *M = F->getParent();
>>> +      IntrinsicInst *II = cast<IntrinsicInst>(CI);
>>> +      Intrinsic::ID ID = II->getIntrinsicID();
>>> +      Type *Tys[] = { VectorType::get(CI->getType(), E->Scalars.size())
>>> };
>>> +      Function *CF = Intrinsic::getDeclaration(M, ID, Tys);
>>> +      Value *V = Builder.CreateCall(CF, OpVecs);
>>> +      E->VectorizedValue = V;
>>> +      return V;
>>> +    }
>>>      default:
>>>      llvm_unreachable("unknown inst");
>>>    }
>>>
>>> Added: llvm/trunk/test/Transforms/SLPVectorizer/X86/intrinsic.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/intrinsic.ll?rev=200576&view=auto
>>>
>>> ==============================================================================
>>> --- llvm/trunk/test/Transforms/SLPVectorizer/X86/intrinsic.ll (added)
>>> +++ llvm/trunk/test/Transforms/SLPVectorizer/X86/intrinsic.ll Fri Jan 31
>>> 15:14:40 2014
>>> @@ -0,0 +1,75 @@
>>> +; RUN: opt < %s -basicaa -slp-vectorizer -slp-threshold=-999 -dce -S
>>> -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7-avx | FileCheck %s
>>> +
>>> +target datalayout =
>>> "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
>>> +target triple = "x86_64-apple-macosx10.8.0"
>>> +
>>> +declare double @llvm.fabs.f64(double) nounwind readnone
>>> +
>>> +;CHECK-LABEL: @vec_fabs_f64(
>>> +;CHECK: load <2 x double>
>>> +;CHECK: load <2 x double>
>>> +;CHECK: call <2 x double> @llvm.fabs.v2f64
>>> +;CHECK: store <2 x double>
>>> +;CHECK: ret
>>> +define void @vec_fabs_f64(double* %a, double* %b, double* %c) {
>>> +entry:
>>> +  %i0 = load double* %a, align 8
>>> +  %i1 = load double* %b, align 8
>>> +  %mul = fmul double %i0, %i1
>>> +  %call = tail call double @llvm.fabs.f64(double %mul) nounwind readnone
>>> +  %arrayidx3 = getelementptr inbounds double* %a, i64 1
>>> +  %i3 = load double* %arrayidx3, align 8
>>> +  %arrayidx4 = getelementptr inbounds double* %b, i64 1
>>> +  %i4 = load double* %arrayidx4, align 8
>>> +  %mul5 = fmul double %i3, %i4
>>> +  %call5 = tail call double @llvm.fabs.f64(double %mul5) nounwind
>>> readnone
>>> +  store double %call, double* %c, align 8
>>> +  %arrayidx5 = getelementptr inbounds double* %c, i64 1
>>> +  store double %call5, double* %arrayidx5, align 8
>>> +  ret void
>>> +}
>>> +
>>> +declare float @llvm.copysign.f32(float, float) nounwind readnone
>>> +
>>> +;CHECK-LABEL: @vec_copysign_f32(
>>> +;CHECK: load <4 x float>
>>> +;CHECK: load <4 x float>
>>> +;CHECK: call <4 x float> @llvm.copysign.v4f32
>>> +;CHECK: store <4 x float>
>>> +;CHECK: ret
>>> +define void @vec_copysign_f32(float* %a, float* %b, float* noalias %c) {
>>> +entry:
>>> +  %0 = load float* %a, align 4
>>> +  %1 = load float* %b, align 4
>>> +  %call0 = tail call float @llvm.copysign.f32(float %0, float %1)
>>> nounwind readnone
>>> +  store float %call0, float* %c, align 4
>>> +
>>> +  %ix2 = getelementptr inbounds float* %a, i64 1
>>> +  %2 = load float* %ix2, align 4
>>> +  %ix3 = getelementptr inbounds float* %b, i64 1
>>> +  %3 = load float* %ix3, align 4
>>> +  %call1 = tail call float @llvm.copysign.f32(float %2, float %3)
>>> nounwind readnone
>>> +  %c1 = getelementptr inbounds float* %c, i64 1
>>> +  store float %call1, float* %c1, align 4
>>> +
>>> +  %ix4 = getelementptr inbounds float* %a, i64 2
>>> +  %4 = load float* %ix4, align 4
>>> +  %ix5 = getelementptr inbounds float* %b, i64 2
>>> +  %5 = load float* %ix5, align 4
>>> +  %call2 = tail call float @llvm.copysign.f32(float %4, float %5)
>>> nounwind readnone
>>> +  %c2 = getelementptr inbounds float* %c, i64 2
>>> +  store float %call2, float* %c2, align 4
>>> +
>>> +  %ix6 = getelementptr inbounds float* %a, i64 3
>>> +  %6 = load float* %ix6, align 4
>>> +  %ix7 = getelementptr inbounds float* %b, i64 3
>>> +  %7 = load float* %ix7, align 4
>>> +  %call3 = tail call float @llvm.copysign.f32(float %6, float %7)
>>> nounwind readnone
>>> +  %c3 = getelementptr inbounds float* %c, i64 3
>>> +  store float %call3, float* %c3, align 4
>>> +
>>> +  ret void
>>> +}
>>> +
>>> +
>>> +
>>>
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>
>>
>>
>

-- 
 Raúl E. Silvera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140203/b61f9a28/attachment.html>