[LLVMdev] Crash in SLP for vector data type as function argument.

Wed Jan 7 09:05:08 PST 2015

The code in emitReduction has to be fixed. As your example shows it is not safe to assume we will always have an instruction as a result of vectorizeTree(). It seems to me that we can just remove the line that performs the cast. All subsequent uses of the value ‘ValToReduce’ actually are uses of “Value *TmpVec”. The IRBuilder in the variable “Builder” carries the insertion point for all operations in this function (inserting after the instruction “ValToReduce” would be a reason why we need an “Instruction”).

  /// \brief Emit a horizontal reduction of the vectorized value.
  Value *emitReduction(Value *VectorizedValue, IRBuilder<> &Builder) {
    assert(VectorizedValue && "Need to have a vectorized tree node");
    Instruction *ValToReduce = dyn_cast<Instruction>(VectorizedValue);
    assert(isPowerOf2_32(ReduxWidth) &&
           "We only handle power-of-two reductions for now");

    Value *TmpVec = ValToReduce;
    for (unsigned i = ReduxWidth / 2; i != 0; i >>= 1) {
      if (IsPairwiseReduction) {
        Value *LeftMask =
          createRdxShuffleMask(ReduxWidth, i, true, true, Builder);
        Value *RightMask =
          createRdxShuffleMask(ReduxWidth, i, true, false, Builder);

        Value *LeftShuf = Builder.CreateShuffleVector(
          TmpVec, UndefValue::get(TmpVec->getType()), LeftMask, "rdx.shuf.l");
        Value *RightShuf = Builder.CreateShuffleVector(
          TmpVec, UndefValue::get(TmpVec->getType()), (RightMask),
          "rdx.shuf.r");
        TmpVec = createBinOp(Builder, ReductionOpcode, LeftShuf, RightShuf,
                             "bin.rdx");
      } else {
        Value *UpperHalf =
          createRdxShuffleMask(ReduxWidth, i, false, false, Builder);
        Value *Shuf = Builder.CreateShuffleVector(
          TmpVec, UndefValue::get(TmpVec->getType()), UpperHalf, "rdx.shuf");
        TmpVec = createBinOp(Builder, ReductionOpcode, TmpVec, Shuf, "bin.rdx");
      }
    }

Thanks,
Arnold

> On Jan 7, 2015, at 3:34 AM, Suyog Kamal Sarda <suyog.sarda at samsung.com> wrote:
> 
> Hi Shahid,
> 
> Thanks for the reply.
> 
> Actually, yes, the emitreduction() takes vectorizedvalue which is leaf of the tree. '
> I got confused by the name of the argument passed while calling emitReduction().
> 
> Value *ReducedSubTree = emitReduction(VectorizedRoot, Builder)
> 
> Anyways, that should hardly matter.
> 
> I had mentioned the test case :
> 
> int foo(uint32x4_t a) {
>  return a[0] + a[1] + a[2] + a[3];
> }
> 
> LLVM IR :
> 
> define i32 @hadd(<4 x i32> %a) {
> entry:
>   %vecext = extractelement <4 x i32> %a, i32 0
>   %vecext1 = extractelement <4 x i32> %a, i32 1
>  %add = add i32 %vecext, %vecext1
>   %vecext2 = extractelement <4 x i32> %a, i32 2
>   %add3 = add i32 %add, %vecext2
>   %vecext4 = extractelement <4 x i32> %a, i32 3
>   %add5 = add i32 %add3, %vecext4
>   ret i32 %add5
> }
> 
> Now, when leaf %vecext is reached, the vectorizeTree() function call sets the VectorizedValue to 0th operand of extractelement instruction.
> 
> case Instruction::ExtractElelement: {
>  if(CanReuseExtract(E->Scalars)) {
>       Value *V = VL0->getOperand(0);
>        E->VectorizedValue = V;
>        return V;
>     }
>    return Gather(E->Scalars, VecTy);
> }
> 
> Now in emitReduction(), the VectorizedValue is dyn_cast to Instruction.
> In above IR, %a is not an instruction (function argument), hence while referring the casted value which is null,
> crash occurs. 
> 
> Instruction *ValToReduce = dyn_cast<Instruction>(VectorizedValue);
> 
> Note : The above test case won't crash with current svn version, since code for parsing the tree for above IR is yet
> to be included in svn. Initial patch was submitted in http://reviews.llvm.org/D6818.
> I am working on refining it, however, the above code flow is not disturbed at all in my patch of parsing. 
> You can try to reproduce the problem by importing above patch in local code.
> 
> When the vector data type 'a' is in global scope, a 'load' instruction is generated in basic block of the function:
> 
> test case 2:
> 
> unint32x4_t a;
> int foo() {
>  return a[0] + a[1] + a[2] + a[3];
> }
> 
> IR for above test case :
> 
> @a = common global <4 x i32> zeroinitializer, align 16
> 
> define i32 @hadd() #0 {
> entry:
>   %0 = load <4 x i32>* @a, align 16, !tbaa !1
>   %vecext = extractelement <4 x i32> %0, i32 0
>   %vecext1 = extractelement <4 x i32> %0, i32 1
>   %add = add i32 %vecext, %vecext1
>  %vecext2 = extractelement <4 x i32> %0, i32 2
>   %add3 = add i32 %add, %vecext2
>   %vecext4 = extractelement <4 x i32> %0, i32 3
>   %add5 = add i32 %add3, %vecext4
>   ret i32 %add5
> }
> 
> Now, since here, 0th operand of leaf %vecext is a load instruction, 
> the dyn_casting into an instruction will succeed here and reduction will be emitted properly.
> 
> How can we solve this problem? What type of casting should a function argument belong to?
> 
> Regards,
> Suyog
> 
> 
> 
> ------- Original Message -------
> Sender : Shahid, Asghar-ahmad<Asghar-ahmad.Shahid at amd.com>
> Date : Jan 07, 2015 20:05 (GMT+09:00)
> Title : RE: [LLVMdev] Crash in SLP for vector data type as function argument.
> 
> Hi Suyog,
> 
> IMO emitReduction() takes a vectorized value which is the leafs of the matched pattern/tree.
> So what you are thinking as root is actually the leaf of the tree.
> Root should actually be the value which is being feed to the "return" statement.
> 
> It would be of great help if you could, share the sample test?
> 
> Regards,
> Shahid
> 
>> -----Original Message-----
>> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu]
>> On Behalf Of Suyog Kamal Sarda
>> Sent: Monday, January 05, 2015 5:40 PM
>> To: nrotem at apple.com; aschwaighofer at apple.com;
>> mzolotukhin at apple.com; james.molloy at arm.com
>> Cc: llvmdev at cs.uiuc.edu
>> Subject: [LLVMdev] Crash in SLP for vector data type as function argument.
>> 
>> Hi all,
>> 
>> Came across a crash in SLP vectorization while testing following code for
>> AArch64 :
>> 
>> int foo(uint32x4_t a) {
>> return a[0] + a[1] + a[2] + a[3];
>> }
>> 
>> The LLVM IR for above code will be:
>> 
>> define i32 @hadd(<4 x i32> %a) {
>> entry:
>>  %vecext = extractelement <4 x i32> %a, i32 0
>>  %vecext1 = extractelement <4 x i32> %a, i32 1
>>  %add = add i32 %vecext, %vecext1
>>  %vecext2 = extractelement <4 x i32> %a, i32 2
>>  %add3 = add i32 %add, %vecext2
>>  %vecext4 = extractelement <4 x i32> %a, i32 3
>>  %add5 = add i32 %add3, %vecext4
>>  ret i32 %add5
>> }
>> 
>> I somehow try to recognize this pattern and try to vectorize it using existing
>> code for horizontal reductions (I just recognize the pattern and fill up the
>> data, rest is done by already existing code.
>> I do pattern matching very badly though, but that's a different story).
>> 
>> 
>> Please note that whatever follows is with existing code, I haven't modified
>> any bit of it.
>> 
>> Now, once the pattern is recognized, we call "trytoReduce()" where we try
>> to vectorize tree by function call "vectorizeTree()" which returns root of the
>> vectorized tree. Then we emit the reduction using call "emitRedcution()"
>> which takes the root of the vector tree as argument. Inside
>> "emitReduction()", we cast root of the tree into an instruction.
>> 
>> Now, for above case, while setting the root of the vectorized tree,
>> extractelement instruction is encountered, and its 0th operand is set as the
>> root of the tree, which in above case is "%a". However, this is not an
>> instruction and hence, when we cast it into an instruction in
>> "emitReduction()" code, it returns nullptr which causes a crash ahead when
>> referencing it.
>> 
>> Take a second case where the vector data type is in global scope.
>> 
>> unint32x4_t a;
>> int foo() {
>> return a[0] + a[1] + a[2] + a[3];
>> }
>> 
>> The IR for above code is:
>> 
>> @a = common global <4 x i32> zeroinitializer, align 16
>> 
>> define i32 @hadd() #0 {
>> entry:
>>  %0 = load <4 x i32>* @a, align 16, !tbaa !1
>>  %vecext = extractelement <4 x i32> %0, i32 0
>>  %vecext1 = extractelement <4 x i32> %0, i32 1
>>  %add = add i32 %vecext, %vecext1
>>  %vecext2 = extractelement <4 x i32> %0, i32 2
>>  %add3 = add i32 %add, %vecext2
>>  %vecext4 = extractelement <4 x i32> %0, i32 3
>>  %add5 = add i32 %add3, %vecext4
>>  ret i32 %add5
>> }
>> 
>> Now in above case, 0th operand of extractelement %0 is a load instruction,
>> and hence it doesn't crash while casting into an instruction and runs smoothly
>> further.
>> 
>> Can someone please suggest how to resolve this? Is there something I am
>> missing or is it a basic problem with IR itself ?
>> 
>> Regards,
>> Suyog
>> 
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev