[AVX512] Add masking variants for vextract*x4

Wed Oct 8 12:10:41 PDT 2014

среда, 8 октября 2014 г. пользователь Robert Khasanov написал:

>
>
> 2014-10-08 21:45 GMT+04:00 Adam Nemet <anemet at apple.com
> <javascript:_e(%7B%7D,'cvml','anemet at apple.com');>>:
>
>>
>> On Oct 8, 2014, at 10:20 AM, Robert Khasanov <rob.khasanov at gmail.com
>> <javascript:_e(%7B%7D,'cvml','rob.khasanov at gmail.com');>> wrote:
>>
>> You are not right about stores, they have only merge-masking version, not
>> zero-masking.
>>
>>
>> Great, there is typo in the SDM:
>>
>> VMOVDQA32 zmm2/m512 {k1}{z}, zmm1
>>   Could you perhaps let the appropriate people know at Intel?
>>
>>
> VMOVDQA32 also have register-register version and it has zero-masking.
> Another form VMOVDQA32 zmm1 {k1}{z}, zmm2/m512 also have register-register
> version but it has another opcode.
> In section 4.4 of SDM: "Instructions which support “zeroing-masking" also
> allow merging-masking".
> So. there is no typo.
>

Sorry, I wrote non-relevant item from this section. You can find in the
same section that stores have merge-masking and not have zero-masking.

> As I understand you correctly:
>> * maskable_custom_mask will have only merge-masking version
>> * maskable_custom_maskz will have only zero-masking version
>> * maskable_custom derives from maskable_custom_mask, and maskable_custom_maskz
>> and include also non-masking version.
>>
>> In this plan I will need also add class that will derives from maskable_custom_mask
>> and also have non-masking version.. I don't think that it's good
>> hierarchy..
>> Since there is no instructions w/zero-masking, I think deriving maskz
>> from mask is more reasonable (as in my version).
>>
>>
>> Yeah given that there does not seem to be an instruction with only
>> zero-masking variant, you’re right.
>>
>> Also I want reduce the namings: not repeat mask in multiclass naming: maskz_custom
>> instead of maskable_custom_maskz, mask_custom instead of
>> maskable_custom_mask
>>
>>
>> It’s confusing not to use something like maskable since this also defines
>> the non-masking version.  That is why I don’t like the current name masking
>> or mask.
>>
>> How about:
>>
>>                  merge_maskable_custom
>>                         |
>>                         |
>>         maskable_custom
>>                     /       \
>>                    /         \
>>           maskable_common   maskable_in_asm
>>             /         \
>>            /            \
>>       maskable        maskable_3src
>>
>> where merge_maskable_custom only has non-masking and merge-masking and
>> maskable_custom also adds zero-masking?
>>
>>
> That is good hierarchy :-) I will try to use merge_masking_custom for CMP
> soon.
>
>
>
>> Adam
>>
>> 2014-10-08 21:05 GMT+04:00 Adam Nemet <anemet at apple.com
>> <javascript:_e(%7B%7D,'cvml','anemet at apple.com');>>:
>>
>>>
>>> On Oct 8, 2014, at 5:51 AM, Robert Khasanov <rob.khasanov at gmail.com
>>> <javascript:_e(%7B%7D,'cvml','rob.khasanov at gmail.com');>> wrote:
>>>
>>>
>>>
>>> 2014-10-08 11:12 GMT+04:00 Adam Nemet <anemet at apple.com
>>> <javascript:_e(%7B%7D,'cvml','anemet at apple.com');>>:
>>>
>>>>
>>>> On Oct 3, 2014, at 11:34 AM, Robert Khasanov <rob.khasanov at gmail.com
>>>> <javascript:_e(%7B%7D,'cvml','rob.khasanov at gmail.com');>> wrote:
>>>>
>>>>
>>>>
>>>> 2014-10-01 21:38 GMT+04:00 Adam Nemet <anemet at apple.com
>>>> <javascript:_e(%7B%7D,'cvml','anemet at apple.com');>>:
>>>>
>>>>>
>>>>> On Oct 1, 2014, at 9:30 AM, Robert Khasanov <rob.khasanov at gmail.com
>>>>> <javascript:_e(%7B%7D,'cvml','rob.khasanov at gmail.com');>> wrote:
>>>>>
>>>>> Hi Adam,
>>>>>
>>>>> About maskings in vextract*x4 instructions. I think we can include
>>>>> VK4 register class and v4i1 type for AVX512F subset as it is needed by
>>>>> instruction semantics. Elena, do you agree? It would be better keep all
>>>>> intrinsics lowering in one place.
>>>>>
>>>>>
>>>>> Hi Robert,
>>>>>
>>>>> I don’t think that a single intrinsic is worth the complication at
>>>>> this point.  I agree that this should be the long term direction but for
>>>>> now it’s probably easier to keep the current simple model in mind that
>>>>> AVX512f only requires v8i1 and above.  So all the strangeness about v4i1
>>>>> e.g. that it cannot be casted directly from a scalar type, etc. are kept to
>>>>> AVX512vl.
>>>>>
>>>>> Once we figure all that out for AVX512vl we will know if it’s worth
>>>>> bringing it into the AVX512f world (for the sake of a single intrinsic or
>>>>> perhaps for some other reason I don’t yet know).
>>>>>
>>>>> So my preference would keep the asm-only masking ops for now and
>>>>> experiment with v4i1 in AVX512f context at a later point.   Is that
>>>>> acceptable?
>>>>>
>>>>>
>>>>  I think yes. We can do it later.
>>>>
>>>>> Patches:
>>>>>
>>>>> 0001: LGTM, anyway. It should be committed even though we decided to
>>>>> include VK4 register class for AVX512F subset, because it would be helpful
>>>>> for other instructions (e.g. CMP that will override masking patterns)
>>>>>
>>>>>
>>>>> Yeah, I was hoping you’d notice that ;).
>>>>>
>>>>> 0004: I don't understand why you use "AVX512_masking_asm_only" naming.
>>>>> It's not obvious from this name that only non-masking pattern is used.
>>>>>
>>>>>
>>>>> Do you have a better suggestion?  What I was trying to express that
>>>>> for the masking variants we only provide asm support and no code-gen.  Is
>>>>> your problem that the name is too masking focused, i.e. that for the
>>>>> non-masking we do provide code-gen?
>>>>>
>>>>>
>>>> I was confused that AVX512_masking_asm_only was derived from
>>>> AVX512_masking_asm multiclass, and it is not obvious what does "only"
>>>> difference mean.. In implementation I see that you provided only
>>>> non-masking pattern.
>>>>
>>>>
>>>> There should have been a comment as well :(.
>>>>
>>>> May be it would be better to provide AVX512_masking_manual (or another
>>>> name) derived from AVX512_masking_asm that will have default values
>>>> (list<dag> Pattern = [], list<dag> MaskingPattern = [], list<dag>
>>>> ZeroMaskingPattern= []). What do you think?
>>>>
>>>>
>>>> I don’t think that’s a good idea.  The input and output operands should
>>>> be specified along with the patterns.  So for an instruction with a custom
>>>> masking and zero-masking pattern (e.g. cmpm), you want to use
>>>> AVX512_masking_asm (i.e. the class that lets you pass Ins, Outs, Pattern,
>>>> Masking pattern and Zero-masking pattern.
>>>>
>>>> So perhaps the naming should be:
>>>>
>>>> 1. masking->maskable  (I think calling this multiclass masking was a
>>>> mistake since it’s also provides the non-masking variant)
>>>> 2. _asm -> _custom
>>>> 3. _asm_only -> _in_asm
>>>>
>>>> So the hierarchy would be like this:
>>>>
>>>>         maskable_custom
>>>>                     /       \
>>>>                    /         \
>>>>           maskable_common   maskable_in_asm
>>>>             /         \
>>>>            /            \
>>>>       maskable        maskable_3src
>>>>
>>>> Let me know if that sounds more reasonable.
>>>>
>>>>
>>> The hierarchy looks like reasonable. I suggest to change masking to
>>> mask/maskz (similar to intel intrinsics name). So, AVX512_mask_custom
>>> includes non-masking and masking versions, and AVX512_maskz_custom is
>>> derived from AVX512_mask_custom and includes zero-masking version. Other
>>> mask/maskz multiclasses are derived from corresponding mask/maskz parent
>>> multiclasses.
>>> AVX512_mask_custom is useful for instructions with only non-masking and
>>> masking versions available (stores, cmp etc.)
>>>
>>>
>>> If I understand you correctly, I rather not create such a bloated
>>> hierarchy without existing users.  We can further split the classes if
>>> necessary.  For example, I think for cmpm (which I think only has a masking
>>> variant — actually implementing zero-masking semantics) we may want to
>>> derive from maskable_custom_mask and for stores which only have
>>> zero-masking variant we want maskable_custom_maskz.  Then maskable_custom
>>> would just derive from both maskable_custom_mask and maskable_custom_maskz.
>>>
>>> Adam
>>>
>>> Adam
>>>>
>>>> Perhaps I should rename these classes to AVX512_maskable or something
>>>>> and call this guy AVX512_maskable_asm_masking?
>>>>>
>>>>> Adam
>>>>>
>>>>> 2014-10-01 17:47 GMT+04:00 Demikhovsky, Elena <
>>>>> elena.demikhovsky at intel.com
>>>>> <javascript:_e(%7B%7D,'cvml','elena.demikhovsky at intel.com');>>:
>>>>>
>>>>>> 0003 : LGTM
>>>>>>
>>>>>> -  Elena
>>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Adam Nemet [mailto:anemet at apple.com
>>>>>> <javascript:_e(%7B%7D,'cvml','anemet at apple.com');>]
>>>>>> Sent: Wednesday, October 01, 2014 08:57
>>>>>> To: Demikhovsky, Elena; Robert Khasanov
>>>>>> Cc: LLVM Commits
>>>>>> Subject: [AVX512] Add masking variants for vextract*x4
>>>>>>
>>>>>> Hi Elena and Robert,
>>>>>>
>>>>>> First I tried to implement this with the usual intrinsic lowering but
>>>>>> unfortunately because vextract*x4 produces v4* vectors I would have needed
>>>>>> VK4 and v4i1 for masking in AVX512f which only come with AVX512vl.  I don't
>>>>>> think these intrinsics are worth the hassle making that work so instead I
>>>>>> went for asm-only instructions.  Then I have the intrinsics Pat<>s to call
>>>>>> the corresponding instruction.
>>>>>>
>>>>>> This approach gave me a chance to incorporate asm-only masking
>>>>>> support into the AVX512_masking hierarchy.  See patch 1 and 4.
>>>>>>
>>>>>> Also the masking operand was not supported with the MRMDestReg
>>>>>> format, so I've added that too.  See patch 2.
>>>>>>
>>>>>> Please let me know if it looks good to you.
>>>>>>
>>>>>> Thanks,
>>>>>> Adam
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> Intel Israel (74) Limited
>>>>>>
>>>>>> This e-mail and any attachments may contain confidential material for
>>>>>> the sole use of the intended recipient(s). Any review or distribution
>>>>>> by others is strictly prohibited. If you are not the intended
>>>>>> recipient, please contact the sender and delete all copies.
>>>>>>
>>>>>
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141008/e507303a/attachment.html>