[PATCH] [AVX512] Enable intrinsics for vexp2{ps/pd}

Wed Nov 5 23:11:26 PST 2014

> On Nov 5, 2014, at 12:12 AM, Demikhovsky, Elena <elena.demikhovsky at intel.com> wrote:
> 
> In my opinion, SDNode is convenient when simple.
> Let's consider VADDPS instruction. We can specify rounding mode in intrinsic and it will require an additional SDNode because it should be specified with the exact number of operands.
> In this case we have double amount of SDNodes for all FP instructions.
> And now we add masks. And we immediately have (NumOfNodes * 2 * 2). And only one form can be combined/converted.

No, the AVX512 single-precision float vector-add SDNode should have all the bells and whistles of vaddps: rounding mode, mask (assuming it’s not valid to formulate it with vselect).

The intrinsic which has the same operands would be lowered to either a vector fadd (if current rounding mode is used and all-one or undefined mask) or the AVX512 SDNode.

This would allow a simple vector fadd to get optimized by the backend.

> So I'd prefer to stay with intrinsics in td. We'll try to shorten the patterns using more templates.

It’s not about shortening.  The problem is that it’s not scalable to define the same simplification rules over and over again with every instruction:

1. if mask is all-true use non-masking format
2. if preserved operand is all-zero, use zero-masking

We managed to push to the first into ISelLowering/DAGCombine and the second into the AVX512_maskable class and we should not take a step back with this patch now.

Adam

> 
> -  Elena
> 
> 
> -----Original Message-----
> From: Adam Nemet [mailto:anemet at apple.com] 
> Sent: Wednesday, November 05, 2014 09:48
> To: Demikhovsky, Elena
> Cc: reviews+D6109+public+2236bef4a30bfa93 at reviews.llvm.org; Badouh, Asaf; rob.khasanov at gmail.com
> Subject: Re: [PATCH] [AVX512] Enable intrinsics for vexp2{ps/pd}
> 
> Hi Elena,
> 
>> On Nov 4, 2014, at 11:23 PM, Demikhovsky, Elena <elena.demikhovsky at intel.com> wrote:
>> 
>> Hi Adam,
>> 
>>> Lower the intrinsics to target-specific SDNodes in 
>>> LowerINTRINSIC_WO_CHAIN
>> We don't need LowerINTRINSIC_WO_CHAIN in this case, because we don't want to invent SDNode for something that reachable via intrinsics only and can't be optimized by LLVM on any stage.
>> So I suppose that direct mapping from intrinsic to instruction is the right way in this case.
> 
> My preference would be to add a new SDNode even in these cases.  I think that adding an enum is cheap and hopefully the only other thing that is needed is the SDNode def in TD.
> 
> With that we have a uniform way of adding intrinsics: (1) add the lowering logic (2) derive from AVX512_maskable.
> 
> I am trying to move away from the three Pat<>s per instruction to match the different masking variants.
> 
> What do you think?
> 
>>> Add masking support to your instruction by deriving from 
>>> AVX512_maskable
>> Asaf is doing ramp-up now and we planned to start with a small simple patch and then extend it to masks and to SAE/CURRENT_DIRECTION forms.
>> 
>>> Use X86VectorVTInfo to improve readability
>> Unlike other instructions, these 3 ERI instructions don't have 128 and 256 bit forms, so X86VectorVTInfo will not give too much, but we can check whether the code looks better with it.
> 
> Thanks.
> 
> Adam
> 
>> 
>> Thank you.
>> -  Elena
>> 
>> 
>> -----Original Message-----
>> From: Adam Nemet [mailto:anemet at apple.com]
>> Sent: Tuesday, November 04, 2014 19:59
>> To: Badouh, Asaf; Demikhovsky, Elena; rob.khasanov at gmail.com; 
>> anemet at apple.com
>> Subject: Re: [PATCH] [AVX512] Enable intrinsics for vexp2{ps/pd}
>> 
>> Hi Asaf,
>> 
>> Please always use full context for the diffs (http://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface).
>> 
>> I know that you only followed rsqrt and rcp precedence but that is not how we add AVX512 intrinsic support these days.  You probably want to look a few of my recent patches for details but this is the basic idea:
>> 
>> 1. Lower the intrinsics to target-specific SDNodes in 
>> LowerINTRINSIC_WO_CHAIN 2. Add masking support to your instruction by 
>> deriving from AVX512_maskable
>> 
>> Also preferably:
>> 3. Create a multiclass to hide the PS/PD duplication 4. Use 
>> X86VectorVTInfo to improve readability
>> 
>> Try to work with smaller patches and separate those that don't have functional changes in order to minimize the size of patches that do have functional changes.
>> 
>> Hopefully this is clear.  Let me know if you need an sample patchset.
>> 
>> Thanks,
>> Adam
>> 
>> http://reviews.llvm.org/D6109
>> 
>> 
>> ---------------------------------------------------------------------
>> Intel Israel (74) Limited
>> 
>> This e-mail and any attachments may contain confidential material for 
>> the sole use of the intended recipient(s). Any review or distribution 
>> by others is strictly prohibited. If you are not the intended 
>> recipient, please contact the sender and delete all copies.
> 
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
> 
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>