[llvm-dev] VBROADCAST Implementation Issues

Mon Aug 7 10:37:12 PDT 2017

You need this line from AVX512 code to tell the register allocation system
that $src1/$dst and $mask/$mask_wb to use the same register. And the early
clobber tells it that $dst and $src2 cannot use the same register.

let Constraints = "@earlyclobber $dst, $src1 = $dst, $mask = $mask_wb"

~Craig

On Mon, Aug 7, 2017 at 10:19 AM, hameeza ahmed <hahmed2305 at gmail.com> wrote:

> Thank You. Still getting errors.I have modified my instructions as you
> said as follows:
>
>
> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, VK64WM:$mask_wb),
> (ins VR_2048:$src1, VK64WM:$mask,  i2048mem:$src2),
>                     "GATHER_256B\t{$src2, {$dst} {${mask}}|${dst}
> {${mask}}, $src2}",
>                     [(set VR_2048:$dst, VK64WM:$mask_wb, (v64i32
> (masked_gather  (VR_2048:$src1), VK64WM:$mask,
>                      addr:$src2)))],
>                     IIC_MOV_MEM>, TA;
>
> def: Pat<(v64f32 (masked_gather (VR_2048:$src1),
> (VK64WM:$mask),(addr:$src2))), (GATHER_256B VR_2048:$src1, VK64WM:$mask,
> addr:$src2)>;
>
>
> Now getting this error:
>
> llvm-tblgen: /utils/TableGen/X86RecognizableInstr.cpp:687: void
> llvm::X86Disassembler::RecognizableInstr::emitInstructionSpecifier():
> Assertion `numPhysicalOperands >= 2 + additionalOperands &&
> numPhysicalOperands <= 4 + additionalOperands && "Unexpected number of
> operands for MRMSrcMemFrm"' failed.
>
>
>
>
>
>
>
>
> On Mon, Aug 7, 2017 at 8:23 PM, Craig Topper <craig.topper at gmail.com>
> wrote:
>
>> masked_gather takes 3 inputs. not just an address. See the AVX512 pattern
>> is pasted earlier
>>
>> ~Craig
>>
>> On Mon, Aug 7, 2017 at 1:54 AM, hameeza ahmed <hahmed2305 at gmail.com>
>> wrote:
>>
>>> Changed it to;
>>>
>>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, VK64:$mask),
>>> (ins i2048mem:$src),
>>>                     "GATHER_256B\t{$src, {$dst}{${mask}}|${dst}
>>> {${mask}}, $src}",
>>>                     [(set VR_2048:$dst, VK64:$mask, (v64i32
>>> (masked_gather addr:$src)))],
>>>                     IIC_MOV_MEM>, TA;
>>> def: Pat<(v64f32 (masked_gather addr:$src)), (GATHER_256B addr:$src)>;
>>> Now getting following error:
>>>
>>> Unhandled memory encoding VK64
>>> Unhandled memory encoding
>>> UNREACHABLE executed at /utils/TableGen/X86RecognizableInstr.cpp:1347!
>>>
>>> What to do?
>>>
>>>
>>> On Mon, Aug 7, 2017 at 1:20 PM, hameeza ahmed <hahmed2305 at gmail.com>
>>> wrote:
>>>
>>>> i am getting this error
>>>> error: Variable not defined: '_'
>>>> for _.KRCWM
>>>> what to do?
>>>>
>>>> On Mon, Aug 7, 2017 at 1:13 PM, hameeza ahmed <hahmed2305 at gmail.com>
>>>> wrote:
>>>>
>>>>> Hello,
>>>>> I did as you said,
>>>>>
>>>>> Please tell me whether the following correct now??
>>>>>
>>>>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst,
>>>>> _.KRCWM:$mask_wb), (VR_2048:$src1, _.KRCWM:$mask, ins i2048mem:$src2),
>>>>>                     "GATHER_256B\t{$src2, {$dst}{${mask}}|${dst}
>>>>> {${mask}}, $src2}"),
>>>>>                     [(set VR_2048:$dst, _.KRCWM:$mask_wb, (v64i32
>>>>> (GatherNode  (VR_2048:$src1), _.KRCWM:$mask,
>>>>>                      VR_2048:$src2))],
>>>>>                     IIC_MOV_MEM>, TA;
>>>>> def: Pat<(v64f32 (GatherNode addr:$src2)), (GATHER_256B addr:$src2)>;
>>>>>
>>>>> Thank You
>>>>>
>>>>> On Mon, Aug 7, 2017 at 2:57 AM, Craig Topper <craig.topper at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> masked_gather returns two results. The data and the modified mask.
>>>>>> Note the $dst and the $mask_wb in the pattern below.
>>>>>>
>>>>>> multiclass avx512_gather<bits<8> opc, string OpcodeStr,
>>>>>> X86VectorVTInfo _,
>>>>>>                          X86MemOperand memop, PatFrag GatherNode> {
>>>>>>   let Constraints = "@earlyclobber $dst, $src1 = $dst, $mask =
>>>>>> $mask_wb",
>>>>>>       ExeDomain = _.ExeDomain in
>>>>>>   def rm  : AVX5128I<opc, MRMSrcMem, (outs _.RC:$dst,
>>>>>> _.KRCWM:$mask_wb),
>>>>>>             (ins _.RC:$src1, _.KRCWM:$mask, memop:$src2),
>>>>>>             !strconcat(OpcodeStr#_.Suffix,
>>>>>>             "\t{$src2, ${dst} {${mask}}|${dst} {${mask}}, $src2}"),
>>>>>>             [(set _.RC:$dst, _.KRCWM:$mask_wb,
>>>>>>               (GatherNode  (_.VT _.RC:$src1), _.KRCWM:$mask,
>>>>>>                      vectoraddr:$src2))]>, EVEX, EVEX_K,
>>>>>>              EVEX_CD8<_.EltSize, CD8VT1>;
>>>>>> }
>>>>>>
>>>>>> ~Craig
>>>>>>
>>>>>> On Sun, Aug 6, 2017 at 2:21 PM, hameeza ahmed <hahmed2305 at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> i want to implement gather for v64i32. i wrote following code.
>>>>>>>
>>>>>>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst), (ins
>>>>>>> i2048mem:$src),
>>>>>>>                     "GATHER_256B\t{$src, $dst|$dst, $src}",
>>>>>>>                     [(set VR_2048:$dst, (v64i32 (masked_gather
>>>>>>> addr:$src)))],
>>>>>>>                     IIC_MOV_MEM>, TA;
>>>>>>> def: Pat<(v64f32 (masked_gather addr:$src)),
>>>>>>> (GATHER_256B addr:$src)>;
>>>>>>>
>>>>>>> Also i wrote this line in isellowering.h
>>>>>>>
>>>>>>>               setOperationAction(ISD::MGATHER,
>>>>>>> MVT::v64i32, Legal);
>>>>>>>
>>>>>>> But I am getting following error:
>>>>>>>
>>>>>>> llvm-tblgen: /utils/TableGen/CodeGenDAGPatterns.cpp:2134:
>>>>>>> llvm::TreePatternNode *llvm::TreePattern::ParseTreePattern(llvm::Init
>>>>>>> *, llvm::StringRef): Assertion `New->getNumTypes() == 1 && "FIXME:
>>>>>>> Unhandled"' failed.
>>>>>>>
>>>>>>> What is my mistake?
>>>>>>>
>>>>>>> Please help me.
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Aug 7, 2017 at 12:03 AM, hameeza ahmed <hahmed2305 at gmail.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> I am trying to implement vector shuffle for v64i32. Is the
>>>>>>>> following correct?
>>>>>>>>
>>>>>>>>
>>>>>>>> def VSHUFFLE_256B  : I<0xE8, MRMDestReg, (outs VR_2048:$dst),
>>>>>>>> (ins VR_2048:$src1, VRPIM_2048:$src2),"VSHUFFLE_256B\t{$src1,
>>>>>>>> $src2, $dst|$dst, $src1, $src2}",
>>>>>>>> [(set VR_2048:$dst, (shufflevector (v64i32 VR_2048:$src1), (v64i32
>>>>>>>> VR_2048:$src2)))]>, TA;
>>>>>>>>
>>>>>>>> Please help.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sun, Aug 6, 2017 at 11:48 PM, hameeza ahmed <
>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> i managed to get rid of above error for VT.is2048BitVector()).
>>>>>>>>>
>>>>>>>>> this was implemented already.
>>>>>>>>>
>>>>>>>>> now will try define other vectors like VT.is4096BitVector()).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sun, Aug 6, 2017 at 11:11 PM, hameeza ahmed <
>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Thank you. actually i have to implement both i32 and i64. so i
>>>>>>>>>> implemented two instructions now one broadcastS other broadcastD. Although
>>>>>>>>>> while doing broadcast from memory to register i was getting no such error
>>>>>>>>>> with 1 instruction and other patterns i64, i32 etc. but then also i
>>>>>>>>>> implemented its 2 versions single and double.
>>>>>>>>>>
>>>>>>>>>> Actually, i am trying to compile matrix multiplication code for
>>>>>>>>>> greater size vector. There i need to include many new instructions in my
>>>>>>>>>> backend like shuffle, gather etc. For now i am getting the following error.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Legalizing: t208: v64i32 = BUILD_VECTOR Constant:i32<-1>,
>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>
>>>>>>>>>> llc: /lib/Target/X86/X86ISelLowering.cpp:5525: llvm::SDValue
>>>>>>>>>> getOnesVector(llvm::EVT, const llvm::X86Subtarget &, llvm::SelectionDAG &,
>>>>>>>>>> const llvm::SDLoc &): Assertion `(VT.is128BitVector() ||
>>>>>>>>>> VT.is256BitVector() || VT.is512BitVector()) && "Expected a 128/256/512-bit
>>>>>>>>>> vector type"' failed.
>>>>>>>>>>
>>>>>>>>>>  i tried including is2048Bit Vector() and others. also in
>>>>>>>>>> vectortype.h i included these types for EVT but was unable to compile
>>>>>>>>>> backend and getting errors.
>>>>>>>>>>
>>>>>>>>>> Please help.
>>>>>>>>>>
>>>>>>>>>> Thank You
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Sun, Aug 6, 2017 at 8:42 PM, Craig Topper <
>>>>>>>>>> craig.topper at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> You need a new instruction. And your scalar register size needs
>>>>>>>>>>> to match your vector element size. So GR32 instead of GR64
>>>>>>>>>>>
>>>>>>>>>>> On Sun, Aug 6, 2017 at 5:44 AM hameeza ahmed <
>>>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Sorry to disturb,
>>>>>>>>>>>> Now i want to implement instruction to broadcast scalar
>>>>>>>>>>>> register content to vector.
>>>>>>>>>>>>
>>>>>>>>>>>> like this;
>>>>>>>>>>>> vpbroadcastq zmm0, rsi
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I tried implementing it as follows;
>>>>>>>>>>>>
>>>>>>>>>>>> def BROADCASTR_256B : I<0x21, MRMSrcReg, (outs VR_2048:$dst),
>>>>>>>>>>>> (ins GR64:$src),
>>>>>>>>>>>>                     "BROADCASTR_256B\t{$src, $dst|$dst, $src}",
>>>>>>>>>>>>                     [(set VR_2048:$dst, (v64i32 (X86VBroadcast
>>>>>>>>>>>>  GR64:$src)))],
>>>>>>>>>>>>                     IIC_MOV_MEM>, TA;
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast GR64:$src)),
>>>>>>>>>>>> (BROADCASTR_256B GR64:$src)>;
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Is it fine? Also do i need to define a new instruction for this
>>>>>>>>>>>> like BROADCASTR_256B? can i use the previous instruction BROADCAST_256B
>>>>>>>>>>>> (the one that broadcast memory scalar to vector) and just define new
>>>>>>>>>>>> pattern?
>>>>>>>>>>>>
>>>>>>>>>>>> Please help.
>>>>>>>>>>>>
>>>>>>>>>>>> Thank You
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Sun, Aug 6, 2017 at 5:10 AM, hameeza ahmed <
>>>>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Thank You so much.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Wao you are simply genius.
>>>>>>>>>>>>> initially I didnt include load in both the main instruction
>>>>>>>>>>>>> and pattern so i included in both as follows:
>>>>>>>>>>>>> def BROADCAST_256B : I<0x31, MRMSrcMem, (outs VR_2048:$dst),
>>>>>>>>>>>>> (ins i2048mem:$src),
>>>>>>>>>>>>>                     "BROADCAST_256B\t{$src, $dst|$dst, $src}",
>>>>>>>>>>>>>                     [(set VR_2048:$dst, (v64i32 (X86VBroadcast
>>>>>>>>>>>>> (loadi32 addr:$src))))],
>>>>>>>>>>>>>                     IIC_MOV_MEM>, TA;
>>>>>>>>>>>>>
>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast (loadf32 addr:$src))),
>>>>>>>>>>>>> (BROADCAST_256B addr:$src)>;
>>>>>>>>>>>>> And it worked perfectly.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thank You again.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 4:28 AM, Craig Topper <
>>>>>>>>>>>>> craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Your pattern needs to be
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast (loadf32 addr:$src))),
>>>>>>>>>>>>>> (BROADCAST_256B addr:$src)>;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ~Craig
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 2:47 PM, hameeza ahmed <
>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> it runs fine with v64i32. but with the following pattern
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)),
>>>>>>>>>>>>>>> (BROADCAST_256B addr:$src)>;
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> i am getting error.
>>>>>>>>>>>>>>> What is wrong with this pattern?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 2:01 AM, hameeza ahmed <
>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> in x86 it is;
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> def : Pat<(int_x86_avx512_vbroadcast_ss_512 addr:$src),
>>>>>>>>>>>>>>>>           (VBROADCASTSSZm addr:$src)>;
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> mine is
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)),
>>>>>>>>>>>>>>>> (BROADCAST_256B addr:$src)>;
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:59 AM, hameeza ahmed <
>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> for v16f32 it is defined as;
>>>>>>>>>>>>>>>>> : Pat<(v16f32 (X86VBroadcast (v16f32 VR512:$src))),
>>>>>>>>>>>>>>>>>           (VBROADCASTSSZr (EXTRACT_SUBREG (v16f32
>>>>>>>>>>>>>>>>> VR512:$src), sub_xmm))>;
>>>>>>>>>>>>>>>>> which is similar to mine.
>>>>>>>>>>>>>>>>> Why its not working then?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:45 AM, Craig Topper <
>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> You need a pattern for v64f32 too.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> ~Craig
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 1:37 PM, hameeza ahmed <
>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> as you said; these are instructions that i defined in
>>>>>>>>>>>>>>>>>>> instrinfo.td
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> def BROADCAST_256B : I<0x31, MRMSrcMem, (outs
>>>>>>>>>>>>>>>>>>> VR_2048:$dst), (ins i2048mem:$src),
>>>>>>>>>>>>>>>>>>>                     "BROADCAST_256B\t{$src, $dst|$dst,
>>>>>>>>>>>>>>>>>>> $src}",
>>>>>>>>>>>>>>>>>>>                     [(set VR_2048:$dst, (v64i32
>>>>>>>>>>>>>>>>>>> (X86VBroadcast addr:$src)))],
>>>>>>>>>>>>>>>>>>>                     IIC_MOV_MEM>, TA;
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)),
>>>>>>>>>>>>>>>>>>> (BROADCAST_256B addr:$src)>;
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:28 AM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I did as you said;
>>>>>>>>>>>>>>>>>>>> now getting this error:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> LLVM ERROR: Cannot select: t63: v64f32 =
>>>>>>>>>>>>>>>>>>>> X86ISD::VBROADCAST t62
>>>>>>>>>>>>>>>>>>>>   t62: f32,ch = load<LD4[ConstantPool]> t0, t65,
>>>>>>>>>>>>>>>>>>>> undef:i64
>>>>>>>>>>>>>>>>>>>>     t65: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>       t64: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>     t8: i64 = undef
>>>>>>>>>>>>>>>>>>>> In function: stencil
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:14 AM, Craig Topper <
>>>>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Add VT.is2048BitVector() to the assert?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> ~Craig
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 1:11 PM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> added the setoperationaction line in
>>>>>>>>>>>>>>>>>>>>>> isellowering.cpp. now getting the following error.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> llc: /lib/Target/X86/X86ISelLowering.cpp:6801:
>>>>>>>>>>>>>>>>>>>>>> llvm::SDValue LowerVectorBroadcast(llvm::BuildVectorSDNode
>>>>>>>>>>>>>>>>>>>>>> *, const llvm::X86Subtarget &, llvm::SelectionDAG &): Assertion
>>>>>>>>>>>>>>>>>>>>>> `(VT.is128BitVector() || VT.is256BitVector() || VT.is512BitVector()) &&
>>>>>>>>>>>>>>>>>>>>>> "Unsupported vector type for broadcast."' failed.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> What should I do?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 12:36 AM, Craig Topper <
>>>>>>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Well first have you done this for your type
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> setOperationAction(ISD::BUILD_VECTOR, v64i32,
>>>>>>>>>>>>>>>>>>>>>>> Custom);
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> ~Craig
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 12:29 PM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> How to do this task??
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 12:24 AM, Craig Topper <
>>>>>>>>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> It looks like X86TargetLowering::LowerBUILD_VECTOR
>>>>>>>>>>>>>>>>>>>>>>>>> is not creating a broadcast node for your wider vector type.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> ~Craig
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 12:19 PM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Thank You.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> I made your mentioned changes and included
>>>>>>>>>>>>>>>>>>>>>>>>>> broadcast instruction in instructioninfo.td. but
>>>>>>>>>>>>>>>>>>>>>>>>>> i made no changes in isellowering.cpp file.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Still getting the following error.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> LLVM ERROR: Cannot select: t29: v64f32 =
>>>>>>>>>>>>>>>>>>>>>>>>>> BUILD_VECTOR t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62
>>>>>>>>>>>>>>>>>>>>>>>>>>   t62: f32,ch = load<LD4[ConstantPool]> t0, t64,
>>>>>>>>>>>>>>>>>>>>>>>>>> undef:i64
>>>>>>>>>>>>>>>>>>>>>>>>>>     t64: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>>>>       t63: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>>>>     t8: i64 = undef
>>>>>>>>>>>>>>>>>>>>>>>>>>   t62: f32,ch = load<LD4[ConstantPool]> t0, t64,
>>>>>>>>>>>>>>>>>>>>>>>>>> undef:i64
>>>>>>>>>>>>>>>>>>>>>>>>>>     t64: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>>>>       t63: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>>>>     t8: i64 = undef
>>>>>>>>>>>>>>>>>>>>>>>>>>   t62: f32,ch = load<LD4[ConstantPool]> t0, t64,
>>>>>>>>>>>>>>>>>>>>>>>>>> undef:i64
>>>>>>>>>>>>>>>>>>>>>>>>>>     t64: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>>>>       t63: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>>>>    .................
>>>>>>>>>>>>>>>>>>>>>>>>>> In function: stencil
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> How to resolve this?
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Please help..
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 11:19 PM, Craig Topper <
>>>>>>>>>>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> You need to use X86VBroadcast not "vbroadcast"
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> ~Craig
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 10:50 AM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> i have a c code which multiplies vector with
>>>>>>>>>>>>>>>>>>>>>>>>>>>> constant something like this;
>>>>>>>>>>>>>>>>>>>>>>>>>>>> float con=0.2;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>    for (k = 0; k < N; k++) {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>        for (i = 1; i <= N-2; i++)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>            for (j = 1; j <= N-2; j++)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>         b[i][j] = con * (a[i][j] + a[i-1][j] +
>>>>>>>>>>>>>>>>>>>>>>>>>>>> a[i+1][j] + a[i][j-1] + a[i][j+1]);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> now in LLVM IR I m getting;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>  %22 = fmul <64 x float> %21, <float
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> but its assembly in x86 gives;
>>>>>>>>>>>>>>>>>>>>>>>>>>>> .LCPI0_0:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> .long 1045220557              # float
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0.200000003
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> vbroadcastss zmm1, dword ptr [rip + .LCPI0_0]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> vmulps zmm2, zmm2, zmm1
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> how does it lowered the above IR code into
>>>>>>>>>>>>>>>>>>>>>>>>>>>> vbroadcastss?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> What would be the pattern here to match?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I want to implement similar broadcast for
>>>>>>>>>>>>>>>>>>>>>>>>>>>> vector of 64 elements.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> i tried the following code;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> def BROADCAST_DWORD : I<0x60, MRMSrcMem, (outs
>>>>>>>>>>>>>>>>>>>>>>>>>>>> VREGG:$dst), (ins immem:$src),
>>>>>>>>>>>>>>>>>>>>>>>>>>>>                     "BROADCAST_DWORD\t{$src,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> $dst|$dst, $src}",
>>>>>>>>>>>>>>>>>>>>>>>>>>>>                     [(set VREGG:$dst, (v64i32
>>>>>>>>>>>>>>>>>>>>>>>>>>>> (vbroadcast addr:$src)))],
>>>>>>>>>>>>>>>>>>>>>>>>>>>>                     IIC_MOV_MEM>, TA;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please help me. I am stuck at this point.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank You
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>> ~Craig
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170807/3fd36f0a/attachment.html>