[llvm-dev] VBROADCAST Implementation Issues
Craig Topper via llvm-dev
llvm-dev at lists.llvm.org
Sat Aug 5 12:24:27 PDT 2017
It looks like X86TargetLowering::LowerBUILD_VECTOR is not creating a
broadcast node for your wider vector type.
~Craig
On Sat, Aug 5, 2017 at 12:19 PM, hameeza ahmed <hahmed2305 at gmail.com> wrote:
> Thank You.
>
> I made your mentioned changes and included broadcast instruction in
> instructioninfo.td. but i made no changes in isellowering.cpp file.
>
> Still getting the following error.
>
>
>
>
> LLVM ERROR: Cannot select: t29: v64f32 = BUILD_VECTOR t62, t62, t62, t62,
> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62,
> t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62, t62
> t62: f32,ch = load<LD4[ConstantPool]> t0, t64, undef:i64
> t64: i64 = X86ISD::Wrapper TargetConstantPool:i64<float
> 0x3FC99999A0000000> 0
> t63: i64 = TargetConstantPool<float 0x3FC99999A0000000> 0
> t8: i64 = undef
> t62: f32,ch = load<LD4[ConstantPool]> t0, t64, undef:i64
> t64: i64 = X86ISD::Wrapper TargetConstantPool:i64<float
> 0x3FC99999A0000000> 0
> t63: i64 = TargetConstantPool<float 0x3FC99999A0000000> 0
> t8: i64 = undef
> t62: f32,ch = load<LD4[ConstantPool]> t0, t64, undef:i64
> t64: i64 = X86ISD::Wrapper TargetConstantPool:i64<float
> 0x3FC99999A0000000> 0
> t63: i64 = TargetConstantPool<float 0x3FC99999A0000000> 0
> .................
> In function: stencil
>
>
>
>
> How to resolve this?
>
> Please help..
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Sat, Aug 5, 2017 at 11:19 PM, Craig Topper <craig.topper at gmail.com>
> wrote:
>
>> You need to use X86VBroadcast not "vbroadcast"
>>
>> ~Craig
>>
>> On Sat, Aug 5, 2017 at 10:50 AM, hameeza ahmed <hahmed2305 at gmail.com>
>> wrote:
>>
>>> Hello,
>>>
>>> i have a c code which multiplies vector with constant something like
>>> this;
>>> float con=0.2;
>>> for (k = 0; k < N; k++) {
>>> for (i = 1; i <= N-2; i++)
>>> for (j = 1; j <= N-2; j++)
>>> b[i][j] = con * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1] +
>>> a[i][j+1]);
>>>
>>>
>>> now in LLVM IR I m getting;
>>>
>>> %22 = fmul <64 x float> %21, <float 0x3FC99999A0000000, float
>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000,
>>> float 0x3FC99999A0000000, float 0x3FC99999A0000000, float
>>> 0x3FC99999A0000000, float 0x3FC99999A0000000, float 0x3FC99999A0000000>
>>>
>>> but its assembly in x86 gives;
>>> .LCPI0_0:
>>> .long 1045220557 # float 0.200000003
>>>
>>> vbroadcastss zmm1, dword ptr [rip + .LCPI0_0]
>>>
>>> vmulps zmm2, zmm2, zmm1
>>>
>>> how does it lowered the above IR code into vbroadcastss?
>>>
>>> What would be the pattern here to match?
>>>
>>> I want to implement similar broadcast for vector of 64 elements.
>>>
>>> i tried the following code;
>>>
>>> def BROADCAST_DWORD : I<0x60, MRMSrcMem, (outs VREGG:$dst), (ins
>>> immem:$src),
>>> "BROADCAST_DWORD\t{$src, $dst|$dst, $src}",
>>> [(set VREGG:$dst, (v64i32 (vbroadcast addr:$src)))],
>>> IIC_MOV_MEM>, TA;
>>>
>>> Please help me. I am stuck at this point.
>>>
>>> Thank You
>>> Regards
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170805/d7182e7d/attachment.html>
More information about the llvm-dev
mailing list