[AVX512] Add masking variants for vextract*x4

Wed Oct 8 10:20:58 PDT 2014

You are not right about stores, they have only merge-masking version, not
zero-masking.

As I understand you correctly:
* maskable_custom_mask will have only merge-masking version
* maskable_custom_maskz will have only zero-masking version
* maskable_custom derives from maskable_custom_mask, and maskable_custom_maskz
and include also non-masking version.

In this plan I will need also add class that will derives from
maskable_custom_mask
and also have non-masking version.. I don't think that it's good
hierarchy..
Since there is no instructions w/zero-masking, I think deriving maskz from
mask is more reasonable (as in my version).

Also I want reduce the namings: not repeat mask in multiclass naming:
maskz_custom
instead of maskable_custom_maskz, mask_custom instead of
maskable_custom_mask

2014-10-08 21:05 GMT+04:00 Adam Nemet <anemet at apple.com>:

>
> On Oct 8, 2014, at 5:51 AM, Robert Khasanov <rob.khasanov at gmail.com>
> wrote:
>
>
>
> 2014-10-08 11:12 GMT+04:00 Adam Nemet <anemet at apple.com>:
>
>>
>> On Oct 3, 2014, at 11:34 AM, Robert Khasanov <rob.khasanov at gmail.com>
>> wrote:
>>
>>
>>
>> 2014-10-01 21:38 GMT+04:00 Adam Nemet <anemet at apple.com>:
>>
>>>
>>> On Oct 1, 2014, at 9:30 AM, Robert Khasanov <rob.khasanov at gmail.com>
>>> wrote:
>>>
>>> Hi Adam,
>>>
>>> About maskings in vextract*x4 instructions. I think we can include VK4
>>> register class and v4i1 type for AVX512F subset as it is needed by
>>> instruction semantics. Elena, do you agree? It would be better keep all
>>> intrinsics lowering in one place.
>>>
>>>
>>> Hi Robert,
>>>
>>> I don’t think that a single intrinsic is worth the complication at this
>>> point.  I agree that this should be the long term direction but for now
>>> it’s probably easier to keep the current simple model in mind that AVX512f
>>> only requires v8i1 and above.  So all the strangeness about v4i1 e.g. that
>>> it cannot be casted directly from a scalar type, etc. are kept to AVX512vl.
>>>
>>> Once we figure all that out for AVX512vl we will know if it’s worth
>>> bringing it into the AVX512f world (for the sake of a single intrinsic or
>>> perhaps for some other reason I don’t yet know).
>>>
>>> So my preference would keep the asm-only masking ops for now and
>>> experiment with v4i1 in AVX512f context at a later point.   Is that
>>> acceptable?
>>>
>>>
>>  I think yes. We can do it later.
>>
>>> Patches:
>>>
>>> 0001: LGTM, anyway. It should be committed even though we decided to
>>> include VK4 register class for AVX512F subset, because it would be helpful
>>> for other instructions (e.g. CMP that will override masking patterns)
>>>
>>>
>>> Yeah, I was hoping you’d notice that ;).
>>>
>>> 0004: I don't understand why you use "AVX512_masking_asm_only" naming.
>>> It's not obvious from this name that only non-masking pattern is used.
>>>
>>>
>>> Do you have a better suggestion?  What I was trying to express that for
>>> the masking variants we only provide asm support and no code-gen.  Is your
>>> problem that the name is too masking focused, i.e. that for the non-masking
>>> we do provide code-gen?
>>>
>>>
>> I was confused that AVX512_masking_asm_only was derived from
>> AVX512_masking_asm multiclass, and it is not obvious what does "only"
>> difference mean.. In implementation I see that you provided only
>> non-masking pattern.
>>
>>
>> There should have been a comment as well :(.
>>
>> May be it would be better to provide AVX512_masking_manual (or another
>> name) derived from AVX512_masking_asm that will have default values
>> (list<dag> Pattern = [], list<dag> MaskingPattern = [], list<dag>
>> ZeroMaskingPattern= []). What do you think?
>>
>>
>> I don’t think that’s a good idea.  The input and output operands should
>> be specified along with the patterns.  So for an instruction with a custom
>> masking and zero-masking pattern (e.g. cmpm), you want to use
>> AVX512_masking_asm (i.e. the class that lets you pass Ins, Outs, Pattern,
>> Masking pattern and Zero-masking pattern.
>>
>> So perhaps the naming should be:
>>
>> 1. masking->maskable  (I think calling this multiclass masking was a
>> mistake since it’s also provides the non-masking variant)
>> 2. _asm -> _custom
>> 3. _asm_only -> _in_asm
>>
>> So the hierarchy would be like this:
>>
>>         maskable_custom
>>                     /       \
>>                    /         \
>>           maskable_common   maskable_in_asm
>>             /         \
>>            /            \
>>       maskable        maskable_3src
>>
>> Let me know if that sounds more reasonable.
>>
>>
> The hierarchy looks like reasonable. I suggest to change masking to
> mask/maskz (similar to intel intrinsics name). So, AVX512_mask_custom
> includes non-masking and masking versions, and AVX512_maskz_custom is
> derived from AVX512_mask_custom and includes zero-masking version. Other
> mask/maskz multiclasses are derived from corresponding mask/maskz parent
> multiclasses.
> AVX512_mask_custom is useful for instructions with only non-masking and
> masking versions available (stores, cmp etc.)
>
>
> If I understand you correctly, I rather not create such a bloated
> hierarchy without existing users.  We can further split the classes if
> necessary.  For example, I think for cmpm (which I think only has a masking
> variant — actually implementing zero-masking semantics) we may want to
> derive from maskable_custom_mask and for stores which only have
> zero-masking variant we want maskable_custom_maskz.  Then maskable_custom
> would just derive from both maskable_custom_mask and maskable_custom_maskz.
>
> Adam
>
> Adam
>>
>> Perhaps I should rename these classes to AVX512_maskable or something and
>>> call this guy AVX512_maskable_asm_masking?
>>>
>>> Adam
>>>
>>> 2014-10-01 17:47 GMT+04:00 Demikhovsky, Elena <
>>> elena.demikhovsky at intel.com>:
>>>
>>>> 0003 : LGTM
>>>>
>>>> -  Elena
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Adam Nemet [mailto:anemet at apple.com]
>>>> Sent: Wednesday, October 01, 2014 08:57
>>>> To: Demikhovsky, Elena; Robert Khasanov
>>>> Cc: LLVM Commits
>>>> Subject: [AVX512] Add masking variants for vextract*x4
>>>>
>>>> Hi Elena and Robert,
>>>>
>>>> First I tried to implement this with the usual intrinsic lowering but
>>>> unfortunately because vextract*x4 produces v4* vectors I would have needed
>>>> VK4 and v4i1 for masking in AVX512f which only come with AVX512vl.  I don't
>>>> think these intrinsics are worth the hassle making that work so instead I
>>>> went for asm-only instructions.  Then I have the intrinsics Pat<>s to call
>>>> the corresponding instruction.
>>>>
>>>> This approach gave me a chance to incorporate asm-only masking support
>>>> into the AVX512_masking hierarchy.  See patch 1 and 4.
>>>>
>>>> Also the masking operand was not supported with the MRMDestReg format,
>>>> so I've added that too.  See patch 2.
>>>>
>>>> Please let me know if it looks good to you.
>>>>
>>>> Thanks,
>>>> Adam
>>>>
>>>> ---------------------------------------------------------------------
>>>> Intel Israel (74) Limited
>>>>
>>>> This e-mail and any attachments may contain confidential material for
>>>> the sole use of the intended recipient(s). Any review or distribution
>>>> by others is strictly prohibited. If you are not the intended
>>>> recipient, please contact the sender and delete all copies.
>>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141008/03e0675f/attachment.html>