[LLVMdev] Crash in SLP for vector data type as function argument.

Thu Jan 8 07:31:17 PST 2015

I am fine with this going in without a test case. This code is currently not enabled. Just mention in the commit message that the assumption that "VectorizedValue" is always an instruction is not correct because we could get a vectorized tree consisting of just a Constant or an Argument vector.

Thanks

On 01/08/15, Suyog Kamal Sarda   wrote:
> Hi Arnold, Shahid,
> 
> Thanks for reply and suggestions.
> 
> I would go with Arnold's suggestion not to cast VectorizedValue to Instruction and use it as it is.
> 
> I can commit patch on this (trivial change in assignment to TempVec).
> 
> However, this issue hits only when we have  http://reviews.llvm.org/D6818 properly committed.
> It will take some time for me to improve that patch. 
> 
> This crash issue doesn't have anything to do with above patch code, so I am not getting how to put a test case for this.
> 
> Can I simply commit it without test case? If yes, should I go ahead with the commit after check-all without putting it for approval?
> Please suggest a proper way to do this. 
> 
> Regards,
> Suyog 
> 
> ------- Original Message -------
> Sender : Arnold Schwaighofer<aschwaighofer at apple.com>
> Date : Jan 08, 2015 02:05 (GMT+09:00)
> Title : Re: [LLVMdev] Crash in SLP for vector data type as function argument.
> 
> The code in emitReduction has to be fixed. As your example shows it is not safe to assume we will always have an instruction as a result of vectorizeTree(). It seems to me that we can just remove the line that performs the cast. All subsequent uses of the value ‘ValToReduce’ actually are uses of “Value *TmpVec”. The IRBuilder in the variable “Builder” carries the insertion point for all operations in this function (inserting after the instruction “ValToReduce” would be a reason why we need an “Instruction”).
> 
>   /// \brief Emit a horizontal reduction of the vectorized value.
>   Value *emitReduction(Value *VectorizedValue, IRBuilder<> &Builder) {
>     assert(VectorizedValue && "Need to have a vectorized tree node");
>     Instruction *ValToReduce = dyn_cast(VectorizedValue);
>     assert(isPowerOf2_32(ReduxWidth) &&
>            "We only handle power-of-two reductions for now");
> 
>     Value *TmpVec = ValToReduce;
>     for (unsigned i = ReduxWidth / 2; i != 0; i >>= 1) {
>       if (IsPairwiseReduction) {
>         Value *LeftMask =
>           createRdxShuffleMask(ReduxWidth, i, true, true, Builder);
>         Value *RightMask =
>           createRdxShuffleMask(ReduxWidth, i, true, false, Builder);
> 
>         Value *LeftShuf = Builder.CreateShuffleVector(
>           TmpVec, UndefValue::get(TmpVec->getType()), LeftMask, "rdx.shuf.l");
>         Value *RightShuf = Builder.CreateShuffleVector(
>           TmpVec, UndefValue::get(TmpVec->getType()), (RightMask),
>           "rdx.shuf.r");
>         TmpVec = createBinOp(Builder, ReductionOpcode, LeftShuf, RightShuf,
>                              "bin.rdx");
>       } else {
>         Value *UpperHalf =
>           createRdxShuffleMask(ReduxWidth, i, false, false, Builder);
>         Value *Shuf = Builder.CreateShuffleVector(
>           TmpVec, UndefValue::get(TmpVec->getType()), UpperHalf, "rdx.shuf");
>         TmpVec = createBinOp(Builder, ReductionOpcode, TmpVec, Shuf, "bin.rdx");
>       }
>     }
> 
> Thanks,
> Arnold
> 
> > On Jan 7, 2015, at 3:34 AM, Suyog Kamal Sarda wrote:
> > 
> > Hi Shahid,
> > 
> > Thanks for the reply.
> > 
> > Actually, yes, the emitreduction() takes vectorizedvalue which is leaf of the tree. '
> > I got confused by the name of the argument passed while calling emitReduction().
> > 
> > Value *ReducedSubTree = emitReduction(VectorizedRoot, Builder)
> > 
> > Anyways, that should hardly matter.
> > 
> > I had mentioned the test case :
> > 
> > int foo(uint32x4_t a) {
> >  return a[0] + a[1] + a[2] + a[3];
> > }
> > 
> > LLVM IR :
> > 
> > define i32 @hadd(<4 x i32> %a) {
> > entry:
> >   %vecext = extractelement <4 x i32> %a, i32 0
> >   %vecext1 = extractelement <4 x i32> %a, i32 1
> >  %add = add i32 %vecext, %vecext1
> >   %vecext2 = extractelement <4 x i32> %a, i32 2
> >   %add3 = add i32 %add, %vecext2
> >   %vecext4 = extractelement <4 x i32> %a, i32 3
> >   %add5 = add i32 %add3, %vecext4
> >   ret i32 %add5
> > }
> > 
> > Now, when leaf %vecext is reached, the vectorizeTree() function call sets the VectorizedValue to 0th operand of extractelement instruction.
> > 
> > case Instruction::ExtractElelement: {
> >  if(CanReuseExtract(E->Scalars)) {
> >       Value *V = VL0->getOperand(0);
> >        E->VectorizedValue = V;
> >        return V;
> >     }
> >    return Gather(E->Scalars, VecTy);
> > }
> > 
> > Now in emitReduction(), the VectorizedValue is dyn_cast to Instruction.
> > In above IR, %a is not an instruction (function argument), hence while referring the casted value which is null,
> > crash occurs. 
> > 
> > Instruction *ValToReduce = dyn_cast(VectorizedValue);
> > 
> > Note : The above test case won't crash with current svn version, since code for parsing the tree for above IR is yet
> > to be included in svn. Initial patch was submitted in http://reviews.llvm.org/D6818.
> > I am working on refining it, however, the above code flow is not disturbed at all in my patch of parsing. 
> > You can try to reproduce the problem by importing above patch in local code.
> > 
> > When the vector data type 'a' is in global scope, a 'load' instruction is generated in basic block of the function:
> > 
> > test case 2:
> > 
> > unint32x4_t a;
> > int foo() {
> >  return a[0] + a[1] + a[2] + a[3];
> > }
> > 
> > IR for above test case :
> > 
> > @a = common global <4 x i32> zeroinitializer, align 16
> > 
> > define i32 @hadd() #0 {
> > entry:
> >   %0 = load <4 x i32>* @a, align 16, !tbaa !1
> >   %vecext = extractelement <4 x i32> %0, i32 0
> >   %vecext1 = extractelement <4 x i32> %0, i32 1
> >   %add = add i32 %vecext, %vecext1
> >  %vecext2 = extractelement <4 x i32> %0, i32 2
> >   %add3 = add i32 %add, %vecext2
> >   %vecext4 = extractelement <4 x i32> %0, i32 3
> >   %add5 = add i32 %add3, %vecext4
> >   ret i32 %add5
> > }
> > 
> > Now, since here, 0th operand of leaf %vecext is a load instruction, 
> > the dyn_casting into an instruction will succeed here and reduction will be emitted properly.
> > 
> > How can we solve this problem? What type of casting should a function argument belong to?
> > 
> > Regards,
> > Suyog
> > 
> > 
> > 
> > ------- Original Message -------
> > Sender : Shahid, Asghar-ahmad
> > Date : Jan 07, 2015 20:05 (GMT+09:00)
> > Title : RE: [LLVMdev] Crash in SLP for vector data type as function argument.
> > 
> > Hi Suyog,
> > 
> > IMO emitReduction() takes a vectorized value which is the leafs of the matched pattern/tree.
> > So what you are thinking as root is actually the leaf of the tree.
> > Root should actually be the value which is being feed to the "return" statement.
> > 
> > It would be of great help if you could, share the sample test?
> > 
> > Regards,
> > Shahid
> > 
> >> -----Original Message-----
> >> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu](javascript:main.compose()
> >> On Behalf Of Suyog Kamal Sarda
> >> Sent: Monday, January 05, 2015 5:40 PM
> >> To: nrotem at apple.com; aschwaighofer at apple.com;
> >> mzolotukhin at apple.com; james.molloy at arm.com
> >> Cc: llvmdev at cs.uiuc.edu
> >> Subject: [LLVMdev] Crash in SLP for vector data type as function argument.
> >> 
> >> Hi all,
> >> 
> >> Came across a crash in SLP vectorization while testing following code for
> >> AArch64 :
> >> 
> >> int foo(uint32x4_t a) {
> >> return a[0] + a[1] + a[2] + a[3];
> >> }
> >> 
> >> The LLVM IR for above code will be:
> >> 
> >> define i32 @hadd(<4 x i32> %a) {
> >> entry:
> >>  %vecext = extractelement <4 x i32> %a, i32 0
> >>  %vecext1 = extractelement <4 x i32> %a, i32 1
> >>  %add = add i32 %vecext, %vecext1
> >>  %vecext2 = extractelement <4 x i32> %a, i32 2
> >>  %add3 = add i32 %add, %vecext2
> >>  %vecext4 = extractelement <4 x i32> %a, i32 3
> >>  %add5 = add i32 %add3, %vecext4
> >>  ret i32 %add5
> >> }
> >> 
> >> I somehow try to recognize this pattern and try to vectorize it using existing
> >> code for horizontal reductions (I just recognize the pattern and fill up the
> >> data, rest is done by already existing code.
> >> I do pattern matching very badly though, but that's a different story).
> >> 
> >> 
> >> Please note that whatever follows is with existing code, I haven't modified
> >> any bit of it.
> >> 
> >> Now, once the pattern is recognized, we call "trytoReduce()" where we try
> >> to vectorize tree by function call "vectorizeTree()" which returns root of the
> >> vectorized tree. Then we emit the reduction using call "emitRedcution()"
> >> which takes the root of the vector tree as argument. Inside
> >> "emitReduction()", we cast root of the tree into an instruction.
> >> 
> >> Now, for above case, while setting the root of the vectorized tree,
> >> extractelement instruction is encountered, and its 0th operand is set as the
> >> root of the tree, which in above case is "%a". However, this is not an
> >> instruction and hence, when we cast it into an instruction in
> >> "emitReduction()" code, it returns nullptr which causes a crash ahead when
> >> referencing it.
> >> 
> >> Take a second case where the vector data type is in global scope.
> >> 
> >> unint32x4_t a;
> >> int foo() {
> >> return a[0] + a[1] + a[2] + a[3];
> >> }
> >> 
> >> The IR for above code is:
> >> 
> >> @a = common global <4 x i32> zeroinitializer, align 16
> >> 
> >> define i32 @hadd() #0 {
> >> entry:
> >>  %0 = load <4 x i32>* @a, align 16, !tbaa !1
> >>  %vecext = extractelement <4 x i32> %0, i32 0
> >>  %vecext1 = extractelement <4 x i32> %0, i32 1
> >>  %add = add i32 %vecext, %vecext1
> >>  %vecext2 = extractelement <4 x i32> %0, i32 2
> >>  %add3 = add i32 %add, %vecext2
> >>  %vecext4 = extractelement <4 x i32> %0, i32 3
> >>  %add5 = add i32 %add3, %vecext4
> >>  ret i32 %add5
> >> }
> >> 
> >> Now in above case, 0th operand of extractelement %0 is a load instruction,
> >> and hence it doesn't crash while casting into an instruction and runs smoothly
> >> further.
> >> 
> >> Can someone please suggest how to resolve this? Is there something I am
> >> missing or is it a basic problem with IR itself ?
> >> 
> >> Regards,
> >> Suyog
> >> 
> >> 
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev