[PATCH][X86] AVX512: Add vbroadcasti*
Adam Nemet
anemet at apple.com
Thu Jun 26 17:53:33 PDT 2014
OK, I poked around more and I think I know what I was missing. Surprisingly (at least to me), X86VBroadcast is defined not only for scalar but also for vector input types. E.g. for AVX:
// scalar:
def : Pat<(v4f64 (X86VBroadcast FR64:$src)),
(VBROADCASTSDYrr (COPY_TO_REGCLASS FR64:$src, VR128))>;
// vector:
def : Pat<(v4f64 (X86VBroadcast (v2f64 VR128:$src))),
(VBROADCASTSDYrr VR128:$src)>;
I would have expected that its input would have to be a scalar type (occupying a vector register, naturally).
Confusingly, for memory variants the input is of course scalar since there the input type also determines the memory access type. E.g.:
def VBROADCASTSDYrm {
…
list<dag> Pattern = [(set VR256:$dst, (v4f64 (X86VBroadcast (loadf64 addr:$src))))];
Ideally, we would want to differentiate scalar vs. sub-vector/tuple broadcasts (VBROADCAST[IF]*) by disambiguating on the input type. We can still probably do it because sub-vector broadcast is only supported on memory operands but it feels less clean. E.g.:
(v8i64 (X86VBroadcast (v4i64 (load addr:$src)))) is a sub-vector broadcast vs.
(v8i64 (X86VBroadcast (v4i64 VR256:$src))) is a scalar broadcast of the first element
(v8i64 (X86VBroadcast (i64 (load addr:$src)))) is a scalar broadcast as well with a memory operand
Nadav, Jim, do you guys have any opinion on this?
In order to move on with this until we have the discussion going, I checked in the version Elena suggested. It’s r211828.
Thanks,
Adam
On Jun 26, 2014, at 6:38 AM, Demikhovsky, Elena <elena.demikhovsky at intel.com> wrote:
>
> I just suggest to separate broadcasting of one element from broadcasting of sub-vector.
> If you are not masking, the same broadcast instruction may be used for multiple types
>
> (v8i64 (X86SubVectorBroadcast (v4i64 (load ..))) - VBROADCASTI64X4
> (v16i32 (X86SubVectorBroadcast (v8i32 (load ..))) - VBROADCASTI64X4
>
>
> multiclass avx512_int_subvec_broadcast_rm<bits<8> opc, string OpcodeStr,
> X86MemOperand x86memop, PatFrag ld_frag,
> RegisterClass DstRC, ValueType OpVT, ValueType SrcVT,
> RegisterClass KRC> {
> let mayLoad = 1 in {
> def rm : AVX5128I<opc, MRMSrcMem, (outs DstRC:$dst), (ins x86memop:$src),
> !strconcat(OpcodeStr, " \t{$src, $dst|$dst, $src}"),
> [(set DstRC:$dst,
> (OpVT (X86SubVectorBroadcast (ld_frag addr:$src))))]>, EVEX;
> def krm : AVX5128I<opc, MRMSrcMem, (outs DstRC:$dst), (ins KRC:$mask,
> x86memop:$src),
> !strconcat(OpcodeStr,
> " \t{$src, ${dst} {${mask}} {z}|${dst} {${mask}} {z}, $src}"),
> [(set DstRC:$dst, (OpVT (X86SubVectorBroadcast KRC:$mask,
> (ld_frag addr:$src))))]>, EVEX, EVEX_KZ;
> }
> }
>
> - Elena
>
>
> -----Original Message-----
> From: Adam Nemet [mailto:anemet at apple.com]
> Sent: Wednesday, June 25, 2014 21:40
> To: Demikhovsky, Elena
> Cc: llvm-commits
> Subject: Re: [PATCH][X86] AVX512: Add vbroadcasti*
>
> Hi Elena,
>
> On Jun 25, 2014, at 12:35 AM, Demikhovsky, Elena <elena.demikhovsky at intel.com> wrote:
>
>> Hi Adam,
>>
>> I don't think that we should mix instructions like VBROADCASTI32X4 to regular, one element, broadcast.
>>
>> I suggest to write one more template specially for these 2 instructions. It may be used for inserting sub-vector, where sub-vector is loaded from memory.
>> I want to say that lowering model of these 2 instructions is different. You can leave the substitution pattern empty.
>
> Can you please be elaborate? In this patch I just applied my conclusions from my AVX2 broadcast work (http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20140526/106526.html and the first patch in http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140526/219238.html).
>
> To summarize, I was suggesting to use the same lowering for all broadcasts there without using LLVM intrinsics. At least that is the plan I outlined there for VBROADCAST[IF]128 which correspond to these instructions. The main benefit is that exposing the memory loads rather than hiding them in an intrinsic opens them up to optimizations (e.g. LICM).
>
> The idea is to use generic nodes to implement these front-end builtins then combine them into X86VBroadcast SDNodes. In the case of vbroadcasti4x64 this would amount to: (v8i64 (X86VBroadcast (v4i64 (load ...)))).
>
> Do you disagree with this direction for codegen?
>
> I can of course keep them separate for now until I have prototyped either AVX2's vbroadcasti128 or the AVX512 counterparts. What I want to make sure at this point that we're on the same page moving toward more of the codegen work.
>
> Thanks,
> Adam
>
>> - Elena
>>
>> -----Original Message-----
>> From: Adam Nemet [mailto:anemet at apple.com]
>> Sent: Wednesday, June 25, 2014 00:07
>> To: Demikhovsky, Elena
>> Cc: llvm-commits
>> Subject: [PATCH][X86] AVX512: Add vbroadcasti*
>>
>> Hi Elena,
>>
>> I have modified the avx512_int_broadcast_rm multiclass to fit these instructions as well.
>>
>> The individual patches provide a bit more information. Please let me know if it looks good to you.
>>
>> Thanks,
>> Adam
>>
>> ---------------------------------------------------------------------
>> Intel Israel (74) Limited
>>
>> This e-mail and any attachments may contain confidential material for
>> the sole use of the intended recipient(s). Any review or distribution
>> by others is strictly prohibited. If you are not the intended
>> recipient, please contact the sender and delete all copies.
>>
>
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
More information about the llvm-commits
mailing list