[llvm-dev] VBROADCAST Implementation Issues

Sun Aug 6 14:57:43 PDT 2017

masked_gather returns two results. The data and the modified mask. Note the
$dst and the $mask_wb in the pattern below.

multiclass avx512_gather<bits<8> opc, string OpcodeStr, X86VectorVTInfo _,
                         X86MemOperand memop, PatFrag GatherNode> {
  let Constraints = "@earlyclobber $dst, $src1 = $dst, $mask = $mask_wb",
      ExeDomain = _.ExeDomain in
  def rm  : AVX5128I<opc, MRMSrcMem, (outs _.RC:$dst, _.KRCWM:$mask_wb),
            (ins _.RC:$src1, _.KRCWM:$mask, memop:$src2),
            !strconcat(OpcodeStr#_.Suffix,
            "\t{$src2, ${dst} {${mask}}|${dst} {${mask}}, $src2}"),
            [(set _.RC:$dst, _.KRCWM:$mask_wb,
              (GatherNode  (_.VT _.RC:$src1), _.KRCWM:$mask,
                     vectoraddr:$src2))]>, EVEX, EVEX_K,
             EVEX_CD8<_.EltSize, CD8VT1>;
}

~Craig

On Sun, Aug 6, 2017 at 2:21 PM, hameeza ahmed <hahmed2305 at gmail.com> wrote:

> i want to implement gather for v64i32. i wrote following code.
>
> def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst), (ins
> i2048mem:$src),
>                     "GATHER_256B\t{$src, $dst|$dst, $src}",
>                     [(set VR_2048:$dst, (v64i32 (masked_gather
> addr:$src)))],
>                     IIC_MOV_MEM>, TA;
> def: Pat<(v64f32 (masked_gather addr:$src)), (GATHER_256B addr:$src)>;
>
> Also i wrote this line in isellowering.h
>
>               setOperationAction(ISD::MGATHER,             MVT::v64i32,
> Legal);
>
> But I am getting following error:
>
> llvm-tblgen: /utils/TableGen/CodeGenDAGPatterns.cpp:2134:
> llvm::TreePatternNode *llvm::TreePattern::ParseTreePattern(llvm::Init *,
> llvm::StringRef): Assertion `New->getNumTypes() == 1 && "FIXME: Unhandled"'
> failed.
>
> What is my mistake?
>
> Please help me.
>
>
> On Mon, Aug 7, 2017 at 12:03 AM, hameeza ahmed <hahmed2305 at gmail.com>
> wrote:
>
>> I am trying to implement vector shuffle for v64i32. Is the following
>> correct?
>>
>>
>> def VSHUFFLE_256B  : I<0xE8, MRMDestReg, (outs VR_2048:$dst),
>> (ins VR_2048:$src1, VRPIM_2048:$src2),"VSHUFFLE_256B\t{$src1, $src2,
>> $dst|$dst, $src1, $src2}",
>> [(set VR_2048:$dst, (shufflevector (v64i32 VR_2048:$src1), (v64i32
>> VR_2048:$src2)))]>, TA;
>>
>> Please help.
>>
>>
>>
>>
>> On Sun, Aug 6, 2017 at 11:48 PM, hameeza ahmed <hahmed2305 at gmail.com>
>> wrote:
>>
>>> i managed to get rid of above error for VT.is2048BitVector()).
>>>
>>> this was implemented already.
>>>
>>> now will try define other vectors like VT.is4096BitVector()).
>>>
>>>
>>>
>>> On Sun, Aug 6, 2017 at 11:11 PM, hameeza ahmed <hahmed2305 at gmail.com>
>>> wrote:
>>>
>>>> Thank you. actually i have to implement both i32 and i64. so i
>>>> implemented two instructions now one broadcastS other broadcastD. Although
>>>> while doing broadcast from memory to register i was getting no such error
>>>> with 1 instruction and other patterns i64, i32 etc. but then also i
>>>> implemented its 2 versions single and double.
>>>>
>>>> Actually, i am trying to compile matrix multiplication code for greater
>>>> size vector. There i need to include many new instructions in my backend
>>>> like shuffle, gather etc. For now i am getting the following error.
>>>>
>>>>
>>>> Legalizing: t208: v64i32 = BUILD_VECTOR Constant:i32<-1>,
>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>,
>>>> Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>
>>>> llc: /lib/Target/X86/X86ISelLowering.cpp:5525: llvm::SDValue
>>>> getOnesVector(llvm::EVT, const llvm::X86Subtarget &, llvm::SelectionDAG &,
>>>> const llvm::SDLoc &): Assertion `(VT.is128BitVector() ||
>>>> VT.is256BitVector() || VT.is512BitVector()) && "Expected a 128/256/512-bit
>>>> vector type"' failed.
>>>>
>>>>  i tried including is2048Bit Vector() and others. also in vectortype.h
>>>> i included these types for EVT but was unable to compile backend and
>>>> getting errors.
>>>>
>>>> Please help.
>>>>
>>>> Thank You
>>>>
>>>>
>>>> On Sun, Aug 6, 2017 at 8:42 PM, Craig Topper <craig.topper at gmail.com>
>>>> wrote:
>>>>
>>>>> You need a new instruction. And your scalar register size needs to
>>>>> match your vector element size. So GR32 instead of GR64
>>>>>
>>>>> On Sun, Aug 6, 2017 at 5:44 AM hameeza ahmed <hahmed2305 at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Sorry to disturb,
>>>>>> Now i want to implement instruction to broadcast scalar register
>>>>>> content to vector.
>>>>>>
>>>>>> like this;
>>>>>> vpbroadcastq zmm0, rsi
>>>>>>
>>>>>>
>>>>>> I tried implementing it as follows;
>>>>>>
>>>>>> def BROADCASTR_256B : I<0x21, MRMSrcReg, (outs VR_2048:$dst), (ins
>>>>>> GR64:$src),
>>>>>>                     "BROADCASTR_256B\t{$src, $dst|$dst, $src}",
>>>>>>                     [(set VR_2048:$dst, (v64i32 (X86VBroadcast
>>>>>>  GR64:$src)))],
>>>>>>                     IIC_MOV_MEM>, TA;
>>>>>>
>>>>>>
>>>>>>
>>>>>> def: Pat<(v64f32 (X86VBroadcast GR64:$src)),
>>>>>> (BROADCASTR_256B GR64:$src)>;
>>>>>>
>>>>>>
>>>>>> Is it fine? Also do i need to define a new instruction for this like
>>>>>> BROADCASTR_256B? can i use the previous instruction BROADCAST_256B (the one
>>>>>> that broadcast memory scalar to vector) and just define new pattern?
>>>>>>
>>>>>> Please help.
>>>>>>
>>>>>> Thank You
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sun, Aug 6, 2017 at 5:10 AM, hameeza ahmed <hahmed2305 at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Thank You so much.
>>>>>>>
>>>>>>> Wao you are simply genius.
>>>>>>> initially I didnt include load in both the main instruction and
>>>>>>> pattern so i included in both as follows:
>>>>>>> def BROADCAST_256B : I<0x31, MRMSrcMem, (outs VR_2048:$dst), (ins
>>>>>>> i2048mem:$src),
>>>>>>>                     "BROADCAST_256B\t{$src, $dst|$dst, $src}",
>>>>>>>                     [(set VR_2048:$dst, (v64i32 (X86VBroadcast (
>>>>>>> loadi32 addr:$src))))],
>>>>>>>                     IIC_MOV_MEM>, TA;
>>>>>>>
>>>>>>> def: Pat<(v64f32 (X86VBroadcast (loadf32 addr:$src))),
>>>>>>> (BROADCAST_256B addr:$src)>;
>>>>>>> And it worked perfectly.
>>>>>>>
>>>>>>> Thank You again.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Aug 6, 2017 at 4:28 AM, Craig Topper <craig.topper at gmail.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> Your pattern needs to be
>>>>>>>>
>>>>>>>> def: Pat<(v64f32 (X86VBroadcast (loadf32 addr:$src))),
>>>>>>>> (BROADCAST_256B addr:$src)>;
>>>>>>>>
>>>>>>>> ~Craig
>>>>>>>>
>>>>>>>> On Sat, Aug 5, 2017 at 2:47 PM, hameeza ahmed <hahmed2305 at gmail.com
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> it runs fine with v64i32. but with the following pattern
>>>>>>>>>
>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)),
>>>>>>>>> (BROADCAST_256B addr:$src)>;
>>>>>>>>>
>>>>>>>>> i am getting error.
>>>>>>>>> What is wrong with this pattern?
>>>>>>>>>
>>>>>>>>> On Sun, Aug 6, 2017 at 2:01 AM, hameeza ahmed <
>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> in x86 it is;
>>>>>>>>>>
>>>>>>>>>> def : Pat<(int_x86_avx512_vbroadcast_ss_512 addr:$src),
>>>>>>>>>>           (VBROADCASTSSZm addr:$src)>;
>>>>>>>>>>
>>>>>>>>>> mine is
>>>>>>>>>>
>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)),
>>>>>>>>>> (BROADCAST_256B addr:$src)>;
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Sun, Aug 6, 2017 at 1:59 AM, hameeza ahmed <
>>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> for v16f32 it is defined as;
>>>>>>>>>>> : Pat<(v16f32 (X86VBroadcast (v16f32 VR512:$src))),
>>>>>>>>>>>           (VBROADCASTSSZr (EXTRACT_SUBREG (v16f32 VR512:$src),
>>>>>>>>>>> sub_xmm))>;
>>>>>>>>>>> which is similar to mine.
>>>>>>>>>>> Why its not working then?
>>>>>>>>>>>
>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:45 AM, Craig Topper <
>>>>>>>>>>> craig.topper at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> You need a pattern for v64f32 too.
>>>>>>>>>>>>
>>>>>>>>>>>> ~Craig
>>>>>>>>>>>>
>>>>>>>>>>>> On Sat, Aug 5, 2017 at 1:37 PM, hameeza ahmed <
>>>>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> as you said; these are instructions that i defined in
>>>>>>>>>>>>> instrinfo.td
>>>>>>>>>>>>>
>>>>>>>>>>>>> def BROADCAST_256B : I<0x31, MRMSrcMem, (outs VR_2048:$dst),
>>>>>>>>>>>>> (ins i2048mem:$src),
>>>>>>>>>>>>>                     "BROADCAST_256B\t{$src, $dst|$dst, $src}",
>>>>>>>>>>>>>                     [(set VR_2048:$dst, (v64i32 (X86VBroadcast
>>>>>>>>>>>>> addr:$src)))],
>>>>>>>>>>>>>                     IIC_MOV_MEM>, TA;
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> def: Pat<(v64f32 (X86VBroadcast addr:$src)),
>>>>>>>>>>>>> (BROADCAST_256B addr:$src)>;
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:28 AM, hameeza ahmed <
>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I did as you said;
>>>>>>>>>>>>>> now getting this error:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> LLVM ERROR: Cannot select: t63: v64f32 = X86ISD::VBROADCAST
>>>>>>>>>>>>>> t62
>>>>>>>>>>>>>>   t62: f32,ch = load<LD4[ConstantPool]> t0, t65, undef:i64
>>>>>>>>>>>>>>     t65: i64 = X86ISD::Wrapper TargetConstantPool:i64<float
>>>>>>>>>>>>>> 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>       t64: i64 = TargetConstantPool<float 0x3FC99999A0000000>
>>>>>>>>>>>>>> 0
>>>>>>>>>>>>>>     t8: i64 = undef
>>>>>>>>>>>>>> In function: stencil
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 1:14 AM, Craig Topper <
>>>>>>>>>>>>>> craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Add VT.is2048BitVector() to the assert?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ~Craig
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 1:11 PM, hameeza ahmed <
>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> added the setoperationaction line in isellowering.cpp. now
>>>>>>>>>>>>>>>> getting the following error.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> llc: /lib/Target/X86/X86ISelLowering.cpp:6801:
>>>>>>>>>>>>>>>> llvm::SDValue LowerVectorBroadcast(llvm::BuildVectorSDNode
>>>>>>>>>>>>>>>> *, const llvm::X86Subtarget &, llvm::SelectionDAG &): Assertion
>>>>>>>>>>>>>>>> `(VT.is128BitVector() || VT.is256BitVector() || VT.is512BitVector()) &&
>>>>>>>>>>>>>>>> "Unsupported vector type for broadcast."' failed.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> What should I do?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 12:36 AM, Craig Topper <
>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Well first have you done this for your type
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> setOperationAction(ISD::BUILD_VECTOR, v64i32, Custom);
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ~Craig
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 12:29 PM, hameeza ahmed <
>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> How to do this task??
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Sun, Aug 6, 2017 at 12:24 AM, Craig Topper <
>>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> It looks like X86TargetLowering::LowerBUILD_VECTOR is
>>>>>>>>>>>>>>>>>>> not creating a broadcast node for your wider vector type.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> ~Craig
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 12:19 PM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thank You.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I made your mentioned changes and included broadcast
>>>>>>>>>>>>>>>>>>>> instruction in instructioninfo.td. but i made no
>>>>>>>>>>>>>>>>>>>> changes in isellowering.cpp file.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Still getting the following error.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> LLVM ERROR: Cannot select: t29: v64f32 = BUILD_VECTOR
>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
>>>>>>>>>>>>>>>>>>>> t62, t62, t62, t62
>>>>>>>>>>>>>>>>>>>>   t62: f32,ch = load<LD4[ConstantPool]> t0, t64,
>>>>>>>>>>>>>>>>>>>> undef:i64
>>>>>>>>>>>>>>>>>>>>     t64: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>       t63: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>     t8: i64 = undef
>>>>>>>>>>>>>>>>>>>>   t62: f32,ch = load<LD4[ConstantPool]> t0, t64,
>>>>>>>>>>>>>>>>>>>> undef:i64
>>>>>>>>>>>>>>>>>>>>     t64: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>       t63: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>     t8: i64 = undef
>>>>>>>>>>>>>>>>>>>>   t62: f32,ch = load<LD4[ConstantPool]> t0, t64,
>>>>>>>>>>>>>>>>>>>> undef:i64
>>>>>>>>>>>>>>>>>>>>     t64: i64 = X86ISD::Wrapper
>>>>>>>>>>>>>>>>>>>> TargetConstantPool:i64<float 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>       t63: i64 = TargetConstantPool<float
>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000> 0
>>>>>>>>>>>>>>>>>>>>    .................
>>>>>>>>>>>>>>>>>>>> In function: stencil
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> How to resolve this?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Please help..
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 11:19 PM, Craig Topper <
>>>>>>>>>>>>>>>>>>>> craig.topper at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> You need to use X86VBroadcast not "vbroadcast"
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> ~Craig
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Sat, Aug 5, 2017 at 10:50 AM, hameeza ahmed <
>>>>>>>>>>>>>>>>>>>>> hahmed2305 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> i have a c code which multiplies vector with constant
>>>>>>>>>>>>>>>>>>>>>> something like this;
>>>>>>>>>>>>>>>>>>>>>> float con=0.2;
>>>>>>>>>>>>>>>>>>>>>>    for (k = 0; k < N; k++) {
>>>>>>>>>>>>>>>>>>>>>>        for (i = 1; i <= N-2; i++)
>>>>>>>>>>>>>>>>>>>>>>            for (j = 1; j <= N-2; j++)
>>>>>>>>>>>>>>>>>>>>>>         b[i][j] = con * (a[i][j] + a[i-1][j] +
>>>>>>>>>>>>>>>>>>>>>> a[i+1][j] + a[i][j-1] + a[i][j+1]);
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> now in LLVM IR I m getting;
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>  %22 = fmul <64 x float> %21, <float
>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>>>>>>>>>>>>>>>>>>>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>>>>>>>>>>>>>>>>>>>>> float 0x3FC99999A0000000>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> but its assembly in x86 gives;
>>>>>>>>>>>>>>>>>>>>>> .LCPI0_0:
>>>>>>>>>>>>>>>>>>>>>> .long 1045220557              # float 0.200000003
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> vbroadcastss zmm1, dword ptr [rip + .LCPI0_0]
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> vmulps zmm2, zmm2, zmm1
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> how does it lowered the above IR code into
>>>>>>>>>>>>>>>>>>>>>> vbroadcastss?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> What would be the pattern here to match?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I want to implement similar broadcast for vector of
>>>>>>>>>>>>>>>>>>>>>> 64 elements.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> i tried the following code;
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> def BROADCAST_DWORD : I<0x60, MRMSrcMem, (outs
>>>>>>>>>>>>>>>>>>>>>> VREGG:$dst), (ins immem:$src),
>>>>>>>>>>>>>>>>>>>>>>                     "BROADCAST_DWORD\t{$src,
>>>>>>>>>>>>>>>>>>>>>> $dst|$dst, $src}",
>>>>>>>>>>>>>>>>>>>>>>                     [(set VREGG:$dst, (v64i32
>>>>>>>>>>>>>>>>>>>>>> (vbroadcast addr:$src)))],
>>>>>>>>>>>>>>>>>>>>>>                     IIC_MOV_MEM>, TA;
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Please help me. I am stuck at this point.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Thank You
>>>>>>>>>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>> --
>>>>> ~Craig
>>>>>
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170806/912e0a88/attachment-0001.html>