[LLVMdev] Crash in SLP for vector data type as function argument.
Suyog Kamal Sarda
suyog.sarda at samsung.com
Fri Jan 9 02:37:04 PST 2015
committed in r225517. Thanks.
------- Original Message -------
Sender : Arnold Schwaighofer<aschwaighofer at apple.com>
Date : Jan 09, 2015 00:31 (GMT+09:00)
Title : Re: Re: [LLVMdev] Crash in SLP for vector data type as function argument.
I am fine with this going in without a test case. This code is currently not enabled. Just mention in the commit message that the assumption that "VectorizedValue" is always an instruction is not correct because we could get a vectorized tree consisting of just a Constant or an Argument vector.
Thanks
On 01/08/15, Suyog Kamal Sarda wrote:
> Hi Arnold, Shahid,
>
> Thanks for reply and suggestions.
>
> I would go with Arnold's suggestion not to cast VectorizedValue to Instruction and use it as it is.
>
> I can commit patch on this (trivial change in assignment to TempVec).
>
> However, this issue hits only when we have http://reviews.llvm.org/D6818 properly committed.
> It will take some time for me to improve that patch.
>
> This crash issue doesn't have anything to do with above patch code, so I am not getting how to put a test case for this.
>
> Can I simply commit it without test case? If yes, should I go ahead with the commit after check-all without putting it for approval?
> Please suggest a proper way to do this.
>
> Regards,
> Suyog
>
> ------- Original Message -------
> Sender : Arnold Schwaighofer
> Date : Jan 08, 2015 02:05 (GMT+09:00)
> Title : Re: [LLVMdev] Crash in SLP for vector data type as function argument.
>
> The code in emitReduction has to be fixed. As your example shows it is not safe to assume we will always have an instruction as a result of vectorizeTree(). It seems to me that we can just remove the line that performs the cast. All subsequent uses of the value ‘ValToReduce’ actually are uses of “Value *TmpVec”. The IRBuilder in the variable “Builder” carries the insertion point for all operations in this function (inserting after the instruction “ValToReduce” would be a reason why we need an “Instruction”).
>
> /// \brief Emit a horizontal reduction of the vectorized value.
> Value *emitReduction(Value *VectorizedValue, IRBuilder<> &Builder) {
> assert(VectorizedValue && "Need to have a vectorized tree node");
> Instruction *ValToReduce = dyn_cast(VectorizedValue);
> assert(isPowerOf2_32(ReduxWidth) &&
> "We only handle power-of-two reductions for now");
>
> Value *TmpVec = ValToReduce;
> for (unsigned i = ReduxWidth / 2; i != 0; i >>= 1) {
> if (IsPairwiseReduction) {
> Value *LeftMask =
> createRdxShuffleMask(ReduxWidth, i, true, true, Builder);
> Value *RightMask =
> createRdxShuffleMask(ReduxWidth, i, true, false, Builder);
>
> Value *LeftShuf = Builder.CreateShuffleVector(
> TmpVec, UndefValue::get(TmpVec->getType()), LeftMask, "rdx.shuf.l");
> Value *RightShuf = Builder.CreateShuffleVector(
> TmpVec, UndefValue::get(TmpVec->getType()), (RightMask),
> "rdx.shuf.r");
> TmpVec = createBinOp(Builder, ReductionOpcode, LeftShuf, RightShuf,
> "bin.rdx");
> } else {
> Value *UpperHalf =
> createRdxShuffleMask(ReduxWidth, i, false, false, Builder);
> Value *Shuf = Builder.CreateShuffleVector(
> TmpVec, UndefValue::get(TmpVec->getType()), UpperHalf, "rdx.shuf");
> TmpVec = createBinOp(Builder, ReductionOpcode, TmpVec, Shuf, "bin.rdx");
> }
> }
>
> Thanks,
> Arnold
>
> > On Jan 7, 2015, at 3:34 AM, Suyog Kamal Sarda wrote:
> >
> > Hi Shahid,
> >
> > Thanks for the reply.
> >
> > Actually, yes, the emitreduction() takes vectorizedvalue which is leaf of the tree. '
> > I got confused by the name of the argument passed while calling emitReduction().
> >
> > Value *ReducedSubTree = emitReduction(VectorizedRoot, Builder)
> >
> > Anyways, that should hardly matter.
> >
> > I had mentioned the test case :
> >
> > int foo(uint32x4_t a) {
> > return a[0] + a[1] + a[2] + a[3];
> > }
> >
> > LLVM IR :
> >
> > define i32 @hadd(<4 x i32> %a) {
> > entry:
> > %vecext = extractelement <4 x i32> %a, i32 0
> > %vecext1 = extractelement <4 x i32> %a, i32 1
> > %add = add i32 %vecext, %vecext1
> > %vecext2 = extractelement <4 x i32> %a, i32 2
> > %add3 = add i32 %add, %vecext2
> > %vecext4 = extractelement <4 x i32> %a, i32 3
> > %add5 = add i32 %add3, %vecext4
> > ret i32 %add5
> > }
> >
> > Now, when leaf %vecext is reached, the vectorizeTree() function call sets the VectorizedValue to 0th operand of extractelement instruction.
> >
> > case Instruction::ExtractElelement: {
> > if(CanReuseExtract(E->Scalars)) {
> > Value *V = VL0->getOperand(0);
> > E->VectorizedValue = V;
> > return V;
> > }
> > return Gather(E->Scalars, VecTy);
> > }
> >
> > Now in emitReduction(), the VectorizedValue is dyn_cast to Instruction.
> > In above IR, %a is not an instruction (function argument), hence while referring the casted value which is null,
> > crash occurs.
> >
> > Instruction *ValToReduce = dyn_cast(VectorizedValue);
> >
> > Note : The above test case won't crash with current svn version, since code for parsing the tree for above IR is yet
> > to be included in svn. Initial patch was submitted in http://reviews.llvm.org/D6818.
> > I am working on refining it, however, the above code flow is not disturbed at all in my patch of parsing.
> > You can try to reproduce the problem by importing above patch in local code.
> >
> > When the vector data type 'a' is in global scope, a 'load' instruction is generated in basic block of the function:
> >
> > test case 2:
> >
> > unint32x4_t a;
> > int foo() {
> > return a[0] + a[1] + a[2] + a[3];
> > }
> >
> > IR for above test case :
> >
> > @a = common global <4 x i32> zeroinitializer, align 16
> >
> > define i32 @hadd() #0 {
> > entry:
> > %0 = load <4 x i32>* @a, align 16, !tbaa !1
> > %vecext = extractelement <4 x i32> %0, i32 0
> > %vecext1 = extractelement <4 x i32> %0, i32 1
> > %add = add i32 %vecext, %vecext1
> > %vecext2 = extractelement <4 x i32> %0, i32 2
> > %add3 = add i32 %add, %vecext2
> > %vecext4 = extractelement <4 x i32> %0, i32 3
> > %add5 = add i32 %add3, %vecext4
> > ret i32 %add5
> > }
> >
> > Now, since here, 0th operand of leaf %vecext is a load instruction,
> > the dyn_casting into an instruction will succeed here and reduction will be emitted properly.
> >
> > How can we solve this problem? What type of casting should a function argument belong to?
> >
> > Regards,
> > Suyog
> >
> >
> >
> > ------- Original Message -------
> > Sender : Shahid, Asghar-ahmad
> > Date : Jan 07, 2015 20:05 (GMT+09:00)
> > Title : RE: [LLVMdev] Crash in SLP for vector data type as function argument.
> >
> > Hi Suyog,
> >
> > IMO emitReduction() takes a vectorized value which is the leafs of the matched pattern/tree.
> > So what you are thinking as root is actually the leaf of the tree.
> > Root should actually be the value which is being feed to the "return" statement.
> >
> > It would be of great help if you could, share the sample test?
> >
> > Regards,
> > Shahid
> >
> >> -----Original Message-----
> >> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu](javascript:main.compose()
> >> On Behalf Of Suyog Kamal Sarda
> >> Sent: Monday, January 05, 2015 5:40 PM
> >> To: nrotem at apple.com; aschwaighofer at apple.com;
> >> mzolotukhin at apple.com; james.molloy at arm.com
> >> Cc: llvmdev at cs.uiuc.edu
> >> Subject: [LLVMdev] Crash in SLP for vector data type as function argument.
> >>
> >> Hi all,
> >>
> >> Came across a crash in SLP vectorization while testing following code for
> >> AArch64 :
> >>
> >> int foo(uint32x4_t a) {
> >> return a[0] + a[1] + a[2] + a[3];
> >> }
> >>
> >> The LLVM IR for above code will be:
> >>
> >> define i32 @hadd(<4 x i32> %a) {
> >> entry:
> >> %vecext = extractelement <4 x i32> %a, i32 0
> >> %vecext1 = extractelement <4 x i32> %a, i32 1
> >> %add = add i32 %vecext, %vecext1
> >> %vecext2 = extractelement <4 x i32> %a, i32 2
> >> %add3 = add i32 %add, %vecext2
> >> %vecext4 = extractelement <4 x i32> %a, i32 3
> >> %add5 = add i32 %add3, %vecext4
> >> ret i32 %add5
> >> }
> >>
> >> I somehow try to recognize this pattern and try to vectorize it using existing
> >> code for horizontal reductions (I just recognize the pattern and fill up the
> >> data, rest is done by already existing code.
> >> I do pattern matching very badly though, but that's a different story).
> >>
> >>
> >> Please note that whatever follows is with existing code, I haven't modified
> >> any bit of it.
> >>
> >> Now, once the pattern is recognized, we call "trytoReduce()" where we try
> >> to vectorize tree by function call "vectorizeTree()" which returns root of the
> >> vectorized tree. Then we emit the reduction using call "emitRedcution()"
> >> which takes the root of the vector tree as argument. Inside
> >> "emitReduction()", we cast root of the tree into an instruction.
> >>
> >> Now, for above case, while setting the root of the vectorized tree,
> >> extractelement instruction is encountered, and its 0th operand is set as the
> >> root of the tree, which in above case is "%a". However, this is not an
> >> instruction and hence, when we cast it into an instruction in
> >> "emitReduction()" code, it returns nullptr which causes a crash ahead when
> >> referencing it.
> >>
> >> Take a second case where the vector data type is in global scope.
> >>
> >> unint32x4_t a;
> >> int foo() {
> >> return a[0] + a[1] + a[2] + a[3];
> >> }
> >>
> >> The IR for above code is:
> >>
> >> @a = common global <4 x i32> zeroinitializer, align 16
> >>
> >> define i32 @hadd() #0 {
> >> entry:
> >> %0 = load <4 x i32>* @a, align 16, !tbaa !1
> >> %vecext = extractelement <4 x i32> %0, i32 0
> >> %vecext1 = extractelement <4 x i32> %0, i32 1
> >> %add = add i32 %vecext, %vecext1
> >> %vecext2 = extractelement <4 x i32> %0, i32 2
> >> %add3 = add i32 %add, %vecext2
> >> %vecext4 = extractelement <4 x i32> %0, i32 3
> >> %add5 = add i32 %add3, %vecext4
> >> ret i32 %add5
> >> }
> >>
> >> Now in above case, 0th operand of extractelement %0 is a load instruction,
> >> and hence it doesn't crash while casting into an instruction and runs smoothly
> >> further.
> >>
> >> Can someone please suggest how to resolve this? Is there something I am
> >> missing or is it a basic problem with IR itself ?
> >>
> >> Regards,
> >> Suyog
> >>
> >>
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
More information about the llvm-dev
mailing list