[LLVMdev] Crash in SLP for vector data type as function argument.
Arnold Schwaighofer
aschwaighofer at apple.com
Wed Jan 7 09:05:08 PST 2015
The code in emitReduction has to be fixed. As your example shows it is not safe to assume we will always have an instruction as a result of vectorizeTree(). It seems to me that we can just remove the line that performs the cast. All subsequent uses of the value ‘ValToReduce’ actually are uses of “Value *TmpVec”. The IRBuilder in the variable “Builder” carries the insertion point for all operations in this function (inserting after the instruction “ValToReduce” would be a reason why we need an “Instruction”).
/// \brief Emit a horizontal reduction of the vectorized value.
Value *emitReduction(Value *VectorizedValue, IRBuilder<> &Builder) {
assert(VectorizedValue && "Need to have a vectorized tree node");
Instruction *ValToReduce = dyn_cast<Instruction>(VectorizedValue);
assert(isPowerOf2_32(ReduxWidth) &&
"We only handle power-of-two reductions for now");
Value *TmpVec = ValToReduce;
for (unsigned i = ReduxWidth / 2; i != 0; i >>= 1) {
if (IsPairwiseReduction) {
Value *LeftMask =
createRdxShuffleMask(ReduxWidth, i, true, true, Builder);
Value *RightMask =
createRdxShuffleMask(ReduxWidth, i, true, false, Builder);
Value *LeftShuf = Builder.CreateShuffleVector(
TmpVec, UndefValue::get(TmpVec->getType()), LeftMask, "rdx.shuf.l");
Value *RightShuf = Builder.CreateShuffleVector(
TmpVec, UndefValue::get(TmpVec->getType()), (RightMask),
"rdx.shuf.r");
TmpVec = createBinOp(Builder, ReductionOpcode, LeftShuf, RightShuf,
"bin.rdx");
} else {
Value *UpperHalf =
createRdxShuffleMask(ReduxWidth, i, false, false, Builder);
Value *Shuf = Builder.CreateShuffleVector(
TmpVec, UndefValue::get(TmpVec->getType()), UpperHalf, "rdx.shuf");
TmpVec = createBinOp(Builder, ReductionOpcode, TmpVec, Shuf, "bin.rdx");
}
}
Thanks,
Arnold
> On Jan 7, 2015, at 3:34 AM, Suyog Kamal Sarda <suyog.sarda at samsung.com> wrote:
>
> Hi Shahid,
>
> Thanks for the reply.
>
> Actually, yes, the emitreduction() takes vectorizedvalue which is leaf of the tree. '
> I got confused by the name of the argument passed while calling emitReduction().
>
> Value *ReducedSubTree = emitReduction(VectorizedRoot, Builder)
>
> Anyways, that should hardly matter.
>
> I had mentioned the test case :
>
> int foo(uint32x4_t a) {
> return a[0] + a[1] + a[2] + a[3];
> }
>
> LLVM IR :
>
> define i32 @hadd(<4 x i32> %a) {
> entry:
> %vecext = extractelement <4 x i32> %a, i32 0
> %vecext1 = extractelement <4 x i32> %a, i32 1
> %add = add i32 %vecext, %vecext1
> %vecext2 = extractelement <4 x i32> %a, i32 2
> %add3 = add i32 %add, %vecext2
> %vecext4 = extractelement <4 x i32> %a, i32 3
> %add5 = add i32 %add3, %vecext4
> ret i32 %add5
> }
>
> Now, when leaf %vecext is reached, the vectorizeTree() function call sets the VectorizedValue to 0th operand of extractelement instruction.
>
> case Instruction::ExtractElelement: {
> if(CanReuseExtract(E->Scalars)) {
> Value *V = VL0->getOperand(0);
> E->VectorizedValue = V;
> return V;
> }
> return Gather(E->Scalars, VecTy);
> }
>
> Now in emitReduction(), the VectorizedValue is dyn_cast to Instruction.
> In above IR, %a is not an instruction (function argument), hence while referring the casted value which is null,
> crash occurs.
>
> Instruction *ValToReduce = dyn_cast<Instruction>(VectorizedValue);
>
> Note : The above test case won't crash with current svn version, since code for parsing the tree for above IR is yet
> to be included in svn. Initial patch was submitted in http://reviews.llvm.org/D6818.
> I am working on refining it, however, the above code flow is not disturbed at all in my patch of parsing.
> You can try to reproduce the problem by importing above patch in local code.
>
> When the vector data type 'a' is in global scope, a 'load' instruction is generated in basic block of the function:
>
> test case 2:
>
> unint32x4_t a;
> int foo() {
> return a[0] + a[1] + a[2] + a[3];
> }
>
> IR for above test case :
>
> @a = common global <4 x i32> zeroinitializer, align 16
>
> define i32 @hadd() #0 {
> entry:
> %0 = load <4 x i32>* @a, align 16, !tbaa !1
> %vecext = extractelement <4 x i32> %0, i32 0
> %vecext1 = extractelement <4 x i32> %0, i32 1
> %add = add i32 %vecext, %vecext1
> %vecext2 = extractelement <4 x i32> %0, i32 2
> %add3 = add i32 %add, %vecext2
> %vecext4 = extractelement <4 x i32> %0, i32 3
> %add5 = add i32 %add3, %vecext4
> ret i32 %add5
> }
>
> Now, since here, 0th operand of leaf %vecext is a load instruction,
> the dyn_casting into an instruction will succeed here and reduction will be emitted properly.
>
> How can we solve this problem? What type of casting should a function argument belong to?
>
> Regards,
> Suyog
>
>
>
> ------- Original Message -------
> Sender : Shahid, Asghar-ahmad<Asghar-ahmad.Shahid at amd.com>
> Date : Jan 07, 2015 20:05 (GMT+09:00)
> Title : RE: [LLVMdev] Crash in SLP for vector data type as function argument.
>
> Hi Suyog,
>
> IMO emitReduction() takes a vectorized value which is the leafs of the matched pattern/tree.
> So what you are thinking as root is actually the leaf of the tree.
> Root should actually be the value which is being feed to the "return" statement.
>
> It would be of great help if you could, share the sample test?
>
> Regards,
> Shahid
>
>> -----Original Message-----
>> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu]
>> On Behalf Of Suyog Kamal Sarda
>> Sent: Monday, January 05, 2015 5:40 PM
>> To: nrotem at apple.com; aschwaighofer at apple.com;
>> mzolotukhin at apple.com; james.molloy at arm.com
>> Cc: llvmdev at cs.uiuc.edu
>> Subject: [LLVMdev] Crash in SLP for vector data type as function argument.
>>
>> Hi all,
>>
>> Came across a crash in SLP vectorization while testing following code for
>> AArch64 :
>>
>> int foo(uint32x4_t a) {
>> return a[0] + a[1] + a[2] + a[3];
>> }
>>
>> The LLVM IR for above code will be:
>>
>> define i32 @hadd(<4 x i32> %a) {
>> entry:
>> %vecext = extractelement <4 x i32> %a, i32 0
>> %vecext1 = extractelement <4 x i32> %a, i32 1
>> %add = add i32 %vecext, %vecext1
>> %vecext2 = extractelement <4 x i32> %a, i32 2
>> %add3 = add i32 %add, %vecext2
>> %vecext4 = extractelement <4 x i32> %a, i32 3
>> %add5 = add i32 %add3, %vecext4
>> ret i32 %add5
>> }
>>
>> I somehow try to recognize this pattern and try to vectorize it using existing
>> code for horizontal reductions (I just recognize the pattern and fill up the
>> data, rest is done by already existing code.
>> I do pattern matching very badly though, but that's a different story).
>>
>>
>> Please note that whatever follows is with existing code, I haven't modified
>> any bit of it.
>>
>> Now, once the pattern is recognized, we call "trytoReduce()" where we try
>> to vectorize tree by function call "vectorizeTree()" which returns root of the
>> vectorized tree. Then we emit the reduction using call "emitRedcution()"
>> which takes the root of the vector tree as argument. Inside
>> "emitReduction()", we cast root of the tree into an instruction.
>>
>> Now, for above case, while setting the root of the vectorized tree,
>> extractelement instruction is encountered, and its 0th operand is set as the
>> root of the tree, which in above case is "%a". However, this is not an
>> instruction and hence, when we cast it into an instruction in
>> "emitReduction()" code, it returns nullptr which causes a crash ahead when
>> referencing it.
>>
>> Take a second case where the vector data type is in global scope.
>>
>> unint32x4_t a;
>> int foo() {
>> return a[0] + a[1] + a[2] + a[3];
>> }
>>
>> The IR for above code is:
>>
>> @a = common global <4 x i32> zeroinitializer, align 16
>>
>> define i32 @hadd() #0 {
>> entry:
>> %0 = load <4 x i32>* @a, align 16, !tbaa !1
>> %vecext = extractelement <4 x i32> %0, i32 0
>> %vecext1 = extractelement <4 x i32> %0, i32 1
>> %add = add i32 %vecext, %vecext1
>> %vecext2 = extractelement <4 x i32> %0, i32 2
>> %add3 = add i32 %add, %vecext2
>> %vecext4 = extractelement <4 x i32> %0, i32 3
>> %add5 = add i32 %add3, %vecext4
>> ret i32 %add5
>> }
>>
>> Now in above case, 0th operand of extractelement %0 is a load instruction,
>> and hence it doesn't crash while casting into an instruction and runs smoothly
>> further.
>>
>> Can someone please suggest how to resolve this? Is there something I am
>> missing or is it a basic problem with IR itself ?
>>
>> Regards,
>> Suyog
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
More information about the llvm-dev
mailing list