[llvm-dev] VBROADCAST Implementation Issues

Mon Aug 7 10:19:29 PDT 2017

Thank You. Still getting errors.I have modified my instructions as you said
as follows:

def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, VK64WM:$mask_wb),
(ins VR_2048:$src1, VK64WM:$mask,  i2048mem:$src2),
                    "GATHER_256B\t{$src2, {$dst} {${mask}}|${dst}
{${mask}}, $src2}",
                    [(set VR_2048:$dst, VK64WM:$mask_wb, (v64i32
(masked_gather  (VR_2048:$src1), VK64WM:$mask,
                     addr:$src2)))],
                    IIC_MOV_MEM>, TA;

def: Pat<(v64f32 (masked_gather (VR_2048:$src1),
(VK64WM:$mask),(addr:$src2))), (GATHER_256B VR_2048:$src1, VK64WM:$mask,
addr:$src2)>;

Now getting this error:

llvm-tblgen: /utils/TableGen/X86RecognizableInstr.cpp:687: void
llvm::X86Disassembler::RecognizableInstr::emitInstructionSpecifier():
Assertion `numPhysicalOperands >= 2 + additionalOperands &&
numPhysicalOperands <= 4 + additionalOperands && "Unexpected number of
operands for MRMSrcMemFrm"' failed.

On Mon, Aug 7, 2017 at 8:23 PM, Craig Topper <craig.topper at gmail.com> wrote:

> masked_gather takes 3 inputs. not just an address. See the AVX512 pattern
> is pasted earlier
>
> ~Craig
>
> On Mon, Aug 7, 2017 at 1:54 AM, hameeza ahmed <hahmed2305 at gmail.com>
> wrote:
>
>> Changed it to;
>>
>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, VK64:$mask),
>> (ins i2048mem:$src),
>>                     "GATHER_256B\t{$src, {$dst}{${mask}}|${dst}
>> {${mask}}, $src}",
>>                     [(set VR_2048:$dst, VK64:$mask, (v64i32
>> (masked_gather addr:$src)))],
>>                     IIC_MOV_MEM>, TA;
>> def: Pat<(v64f32 (masked_gather addr:$src)), (GATHER_256B addr:$src)>;
>> Now getting following error:
>>
>> Unhandled memory encoding VK64
>> Unhandled memory encoding
>> UNREACHABLE executed at /utils/TableGen/X86RecognizableInstr.cpp:1347!
>>
>> What to do?
>>
>>
>> On Mon, Aug 7, 2017 at 1:20 PM, hameeza ahmed <hahmed2305 at gmail.com>
>> wrote:
>>
>>> i am getting this error
>>> error: Variable not defined: '_'
>>> for _.KRCWM
>>> what to do?
>>>
>>> On Mon, Aug 7, 2017 at 1:13 PM, hameeza ahmed <hahmed2305 at gmail.com>
>>> wrote:
>>>
>>>> Hello,
>>>> I did as you said,
>>>>
>>>> Please tell me whether the following correct now??
>>>>
>>>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst,
>>>> _.KRCWM:$mask_wb), (VR_2048:$src1, _.KRCWM:$mask, ins i2048mem:$src2),
>>>>                     "GATHER_256B\t{$src2, {$dst}{${mask}}|${dst}
>>>> {${mask}}, $src2}"),
>>>>                     [(set VR_2048:$dst, _.KRCWM:$mask_wb, (v64i32
>>>> (GatherNode  (VR_2048:$src1), _.KRCWM:$mask,
>>>>                      VR_2048:$src2))],
>>>>                     IIC_MOV_MEM>, TA;
>>>> def: Pat<(v64f32 (GatherNode addr:$src2)), (GATHER_256B addr:$src2)>;
>>>>
>>>> Thank You
>>>>
>>>> On Mon, Aug 7, 2017 at 2:57 AM, Craig Topper <craig.topper at gmail.com>
>>>> wrote:
>>>>
>>>>> masked_gather returns two results. The data and the modified mask.
>>>>> Note the $dst and the $mask_wb in the pattern below.
>>>>>
>>>>> multiclass avx512_gather<bits<8> opc, string OpcodeStr,
>>>>> X86VectorVTInfo _,
>>>>>                          X86MemOperand memop, PatFrag GatherNode> {
>>>>>   let Constraints = "@earlyclobber $dst, $src1 = $dst, $mask =
>>>>> $mask_wb",
>>>>>       ExeDomain = _.ExeDomain in
>>>>>   def rm  : AVX5128I<opc, MRMSrcMem, (outs _.RC:$dst,
>>>>> _.KRCWM:$mask_wb),
>>>>>             (ins _.RC:$src1, _.KRCWM:$mask, memop:$src2),
>>>>>             !strconcat(OpcodeStr#_.Suffix,
>>>>>             "\t{$src2, ${dst} {${mask}}|${dst} {${mask}}, $src2}"),
>>>>>             [(set _.RC:$dst, _.KRCWM:$mask_wb,
>>>>>               (GatherNode  (_.VT _.RC:$src1), _.KRCWM:$mask,
>>>>>                      vectoraddr:$src2))]>, EVEX, EVEX_K,
>>>>>              EVEX_CD8<_.EltSize, CD8VT1>;
>>>>> }
>>>>>
>>>>> ~Craig
>>>>>
>>>>> On Sun, Aug 6, 2017 at 2:21 PM, hameeza ahmed <hahmed2305 at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> i want to implement gather for v64i32. i wrote following code.
>>>>>>
>>>>>> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst), (ins
>>>>>> i2048mem:$src),
>>>>>>                     "GATHER_256B\t{$src, $dst|$dst, $src}",
>>>>>>                     [(set VR_2048:$dst, (v64i32 (masked_gather
>>>>>> addr:$src)))],
>>>>>>                     IIC_MOV_MEM>, TA;
>>>>>> def: Pat<(v64f32 (masked_gather addr:$src)), (GATHER_256B addr:$src)>;
>>>>>>
>>>>>> Also i wrote this line in isellowering.h
>>>>>>
>>>>>>               setOperationAction(ISD::MGATHER,
>>>>>> MVT::v64i32, Legal);
>>>>>>
>>>>>> But I am getting following error:
>>>>>>
>>>>>> llvm-tblgen: /utils/TableGen/CodeGenDAGPatterns.cpp:2134:
>>>>>> llvm::TreePatternNode *llvm::TreePattern::ParseTreePattern(llvm::Init
>>>>>> *, llvm::StringRef): Assertion `New->getNumTypes() == 1 && "FIXME:
>>>>>> Unhandled"' failed.
>>>>>>
>>>>>> What is my mistake?
>>>>>>
>>>>>> Please help me.
>>>>>>
>>>>>>
>>>>>> On Mon, Aug 7, 2017 at 12:03 AM, hameeza ahmed <hahmed2305 at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I am trying to implement vector shuffle for v64i32. Is the following
>>>>>>> correct?
>>>>>>>
>>>>>>>
>>>>>>> def VSHUFFLE_256B  : I<0xE8, MRMDestReg, (outs VR_2048:$dst),
>>>>>>> (ins VR_2048:$src1, VRPIM_2048:$src2),"VSHUFFLE_256B\t{$src1,
>>>>>>> $src2, $dst|$dst, $src1, $src2}",
>>>>>>> [(set VR_2048:$dst, (shufflevector (v64i32 VR_2048:$src1), (v64i32
>>>>>>> VR_2048:$src2)))]>, TA;
>>>>>>>
>>>>>>> Please help.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Aug 6, 2017 at 11:48 PM, hameeza ahmed <hahmed2305 at gmail.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> i managed to get rid of above error for VT.is2048BitVector()).
>>>>>>>>
>>>>>>>> this was implemented already.
>>>>>>>>
>>>>>>>> now will try define other vectors like VT.is4096BitVector()).
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sun, Aug 6, 2017 at 11:11 PM, hameeza ahmed <
>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Thank you. actually i have to implement both i32 and i64. so i
>>>>>>>>> implemented two instructions now one broadcastS other broadcastD. Although
>>>>>>>>> while doing broadcast from memory to register i was getting no such error
>>>>>>>>> with 1 instruction and other patterns i64, i32 etc. but then also i
>>>>>>>>> implemented its 2 versions single and double.
>>>>>>>>>
>>>>>>>>> Actually, i am trying to compile matrix multiplication code for
>>>>>>>>> greater size vector. There i need to include many new instructions in my
>>>>>>>>> backend like shuffle, gather etc. For now i am getting the following error.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Legalizing: t208: v64i32 = BUILD_VECTOR Constant:i32<-1>,
>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>>>>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>
>>>>>>>>> llc: /lib/Target/X86/X86ISelLowering.cpp:5525: llvm::SDValue
>>>>>>>>> getOnesVector(llvm::EVT, const llvm::X86Subtarget &, llvm::SelectionDAG &,
>>>>>>>>> const llvm::SDLoc &): Assertion `(VT.is128BitVector() ||
>>>>>>>>> VT.is256BitVector() || VT.is512BitVector()) && "Expected a 128/256/512-bit
>>>>>>>>> vector type"' failed.
>>>>>>>>>
>>>>>>>>>  i tried including is2048Bit Vector() and others. also in
>>>>>>>>> vectortype.h i included these types for EVT but was unable to compile
>>>>>>>>> backend and getting errors.
>>>>>>>>>
>>>>>>>>> Please help.
>>>>>>>>>
>>>>>>>>> Thank You
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sun, Aug 6, 2017 at 8:42 PM, Craig Topper <
>>>>>>>>> craig.topper at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> You need a new instruction. And your scalar register size needs
>>>>>>>>>> to match your vector element size. So GR32 instead of GR64
>>>>>>>>>>
>>>>>>>>>> On Sun, Aug 6, 2017 at 5:44 AM hameeza ahmed <
>>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Sorry to disturb,
>>>>>>>>>>> Now i want to implement instruction to broadcast scalar register
>>>>>>>>>>> content to vector.
>>>>>>>>>>>
>>>>>>>>>>> like this;
>>>>>>>>>>> vpbroadcastq zmm0, rsi
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I tried implementing it as follows;
>>>>>>>>>>>
>>>>>>>>>>> def BROADCASTR_256B : I<0x21, MRMSrcReg, (outs VR_2048:$dst),
>>>>>>>>>>> (ins GR64:$src),
>>>>>>>>>>>                     "BROADCASTR_256B\t{$src, $dst|$dst, $src}",
>>>>>>>>>>>                     [(set VR_2048:$dst, (v64i32 (X86VBroadcast
>>>>>>>>>>>  GR64:$src)))],
>>>>>>>>>>>                     IIC_MOV_MEM>, TA;
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast GR64:$src)),
>>>>>>>>>>> (BROADCASTR_256B GR64:$src)>;
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Is it fine? Also do i need to define a new instruction for this
>>>>>>>>>>> like BROADCASTR_256B? can i use the previous instruction BROADCAST_256B
>>>>>>>>>>> (the one that broadcast memory scalar to vector) and just define new
>>>>>>>>>>> pattern?
>>>>>>>>>>>
>>>>>>>>>>> Please help.
>>>>>>>>>>>
>>>>>>>>>>> Thank You
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Sun, Aug 6, 2017 at 5:10 AM, hameeza ahmed <
>>>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Thank You so much.
>>>>>>>>>>>>
>>>>>>>>>>>> Wao you are simply genius.
>>>>>>>>>>>> initially I didnt include load in both the main instruction and
>>>>>>>>>>>> pattern so i included in both as follows:
>>>>>>>>>>>> def BROADCAST_256B : I<0x31, MRMSrcMem, (outs VR_2048:$dst),
>>>>>>>>>>>> (ins i2048mem:$src),
>>>>>>>>>>>>                     "BROADCAST_256B\t{$src, $dst|$dst, $src}",
>>>>>>>>>>>>                     [(set VR_2048:$dst, (v64i32 (X86VBroadcast (
>>>>>>>>>>>> loadi32 addr:$src))))],
>>>>>>>>>>>>                     IIC_MOV_MEM>, TA;
>>>>>>>>>>>>
>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast (loadf32 addr:$src))),
>>>>>>>>>>>> (BROADCAST_256B addr:$src)>;
>>>>>>>>>>>> And it worked perfectly.
>>>>>>>>>>>>
>>>>>>>>>>>> Thank You again.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Sun, Aug 6, 2017 at 4:28 AM, Craig Topper <
>>>>>>>>>>>> craig.topper at gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Your pattern needs to be
>>>>>>>>>>>>>
>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast (loadf32 addr:$src))),
>>>>>>>>>>>>> (BROADCAST_256B addr:$src)>;
>>>>>>>>>>>>>
>>>>>>>>>>>>> ~Craig
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 2:47 PM, hameeza ahmed <
>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> it runs fine with v64i32. but with the following pattern
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)),
>>>>>>>>>>>>>> (BROADCAST_256B addr:$src)>;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> i am getting error.
>>>>>>>>>>>>>> What is wrong with this pattern?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 2:01 AM, hameeza ahmed <
>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> in x86 it is;
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> def : Pat<(int_x86_avx512_vbroadcast_ss_512 addr:$src),
>>>>>>>>>>>>>>>           (VBROADCASTSSZm addr:$src)>;
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> mine is
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)),
>>>>>>>>>>>>>>> (BROADCAST_256B addr:$src)>;
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:59 AM, hameeza ahmed <
>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> for v16f32 it is defined as;
>>>>>>>>>>>>>>>> : Pat<(v16f32 (X86VBroadcast (v16f32 VR512:$src))),
>>>>>>>>>>>>>>>>           (VBROADCASTSSZr (EXTRACT_SUBREG (v16f32
>>>>>>>>>>>>>>>> VR512:$src), sub_xmm))>;
>>>>>>>>>>>>>>>> which is similar to mine.
>>>>>>>>>>>>>>>> Why its not working then?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:45 AM, Craig Topper <
>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> You need a pattern for v64f32 too.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ~Craig
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 1:37 PM, hameeza ahmed <
>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> as you said; these are instructions that i defined in
>>>>>>>>>>>>>>>>>> instrinfo.td
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> def BROADCAST_256B : I<0x31, MRMSrcMem, (outs
>>>>>>>>>>>>>>>>>> VR_2048:$dst), (ins i2048mem:$src),
>>>>>>>>>>>>>>>>>>                     "BROADCAST_256B\t{$src, $dst|$dst,
>>>>>>>>>>>>>>>>>> $src}",
>>>>>>>>>>>>>>>>>>                     [(set VR_2048:$dst, (v64i32
>>>>>>>>>>>>>>>>>> (X86VBroadcast addr:$src)))],
>>>>>>>>>>>>>>>>>>                     IIC_MOV_MEM>, TA;
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)),
>>>>>>>>>>>>>>>>>> (BROADCAST_256B addr:$src)>;
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:28 AM, hameeza ahmed <
>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I did as you said;
>>>>>>>>>>>>>>>>>>> now getting this error:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> LLVM ERROR: Cannot select: t63: v64f32 =
>>>>>>>>>>>>>>>>>>> X86ISD::VBROADCAST t62
>>>>>>>>>>>>>>>>>>>   t62: f32,ch = load<LD4[ConstantPool]> t0, t65,
>>>>>>>>>>>>>>>>>>> undef:i64
>>>>>>>>>>>>>>>>>>>     t65: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>       t64: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>     t8: i64 = undef
>>>>>>>>>>>>>>>>>>> In function: stencil
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:14 AM, Craig Topper <
>>>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Add VT.is2048BitVector() to the assert?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> ~Craig
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 1:11 PM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> added the setoperationaction line in isellowering.cpp.
>>>>>>>>>>>>>>>>>>>>> now getting the following error.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> llc: /lib/Target/X86/X86ISelLowering.cpp:6801:
>>>>>>>>>>>>>>>>>>>>> llvm::SDValue LowerVectorBroadcast(llvm::BuildVectorSDNode
>>>>>>>>>>>>>>>>>>>>> *, const llvm::X86Subtarget &, llvm::SelectionDAG &): Assertion
>>>>>>>>>>>>>>>>>>>>> `(VT.is128BitVector() || VT.is256BitVector() || VT.is512BitVector()) &&
>>>>>>>>>>>>>>>>>>>>> "Unsupported vector type for broadcast."' failed.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> What should I do?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 12:36 AM, Craig Topper <
>>>>>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Well first have you done this for your type
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> setOperationAction(ISD::BUILD_VECTOR, v64i32,
>>>>>>>>>>>>>>>>>>>>>> Custom);
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> ~Craig
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 12:29 PM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> How to do this task??
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 12:24 AM, Craig Topper <
>>>>>>>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> It looks like X86TargetLowering::LowerBUILD_VECTOR
>>>>>>>>>>>>>>>>>>>>>>>> is not creating a broadcast node for your wider vector type.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> ~Craig
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 12:19 PM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Thank You.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> I made your mentioned changes and included
>>>>>>>>>>>>>>>>>>>>>>>>> broadcast instruction in instructioninfo.td. but
>>>>>>>>>>>>>>>>>>>>>>>>> i made no changes in isellowering.cpp file.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Still getting the following error.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> LLVM ERROR: Cannot select: t29: v64f32 =
>>>>>>>>>>>>>>>>>>>>>>>>> BUILD_VECTOR t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62
>>>>>>>>>>>>>>>>>>>>>>>>>   t62: f32,ch = load<LD4[ConstantPool]> t0, t64,
>>>>>>>>>>>>>>>>>>>>>>>>> undef:i64
>>>>>>>>>>>>>>>>>>>>>>>>>     t64: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>>>       t63: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>>>     t8: i64 = undef
>>>>>>>>>>>>>>>>>>>>>>>>>   t62: f32,ch = load<LD4[ConstantPool]> t0, t64,
>>>>>>>>>>>>>>>>>>>>>>>>> undef:i64
>>>>>>>>>>>>>>>>>>>>>>>>>     t64: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>>>       t63: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>>>     t8: i64 = undef
>>>>>>>>>>>>>>>>>>>>>>>>>   t62: f32,ch = load<LD4[ConstantPool]> t0, t64,
>>>>>>>>>>>>>>>>>>>>>>>>> undef:i64
>>>>>>>>>>>>>>>>>>>>>>>>>     t64: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>>>       t63: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>>>>>>    .................
>>>>>>>>>>>>>>>>>>>>>>>>> In function: stencil
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> How to resolve this?
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Please help..
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 11:19 PM, Craig Topper <
>>>>>>>>>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> You need to use X86VBroadcast not "vbroadcast"
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> ~Craig
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 10:50 AM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> i have a c code which multiplies vector with
>>>>>>>>>>>>>>>>>>>>>>>>>>> constant something like this;
>>>>>>>>>>>>>>>>>>>>>>>>>>> float con=0.2;
>>>>>>>>>>>>>>>>>>>>>>>>>>>    for (k = 0; k < N; k++) {
>>>>>>>>>>>>>>>>>>>>>>>>>>>        for (i = 1; i <= N-2; i++)
>>>>>>>>>>>>>>>>>>>>>>>>>>>            for (j = 1; j <= N-2; j++)
>>>>>>>>>>>>>>>>>>>>>>>>>>>         b[i][j] = con * (a[i][j] + a[i-1][j] +
>>>>>>>>>>>>>>>>>>>>>>>>>>> a[i+1][j] + a[i][j-1] + a[i][j+1]);
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> now in LLVM IR I m getting;
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>  %22 = fmul <64 x float> %21, <float
>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> but its assembly in x86 gives;
>>>>>>>>>>>>>>>>>>>>>>>>>>> .LCPI0_0:
>>>>>>>>>>>>>>>>>>>>>>>>>>> .long 1045220557              # float
>>>>>>>>>>>>>>>>>>>>>>>>>>> 0.200000003
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> vbroadcastss zmm1, dword ptr [rip + .LCPI0_0]
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> vmulps zmm2, zmm2, zmm1
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> how does it lowered the above IR code into
>>>>>>>>>>>>>>>>>>>>>>>>>>> vbroadcastss?
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> What would be the pattern here to match?
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> I want to implement similar broadcast for vector
>>>>>>>>>>>>>>>>>>>>>>>>>>> of 64 elements.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> i tried the following code;
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> def BROADCAST_DWORD : I<0x60, MRMSrcMem, (outs
>>>>>>>>>>>>>>>>>>>>>>>>>>> VREGG:$dst), (ins immem:$src),
>>>>>>>>>>>>>>>>>>>>>>>>>>>                     "BROADCAST_DWORD\t{$src,
>>>>>>>>>>>>>>>>>>>>>>>>>>> $dst|$dst, $src}",
>>>>>>>>>>>>>>>>>>>>>>>>>>>                     [(set VREGG:$dst, (v64i32
>>>>>>>>>>>>>>>>>>>>>>>>>>> (vbroadcast addr:$src)))],
>>>>>>>>>>>>>>>>>>>>>>>>>>>                     IIC_MOV_MEM>, TA;
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Please help me. I am stuck at this point.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank You
>>>>>>>>>>>>>>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>> ~Craig
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170807/a4a6e843/attachment-0001.html>