[AVX512] Add masking variants for vextract*x4

Wed Oct 8 11:19:19 PDT 2014

2014-10-08 21:45 GMT+04:00 Adam Nemet <anemet at apple.com>:

>
> On Oct 8, 2014, at 10:20 AM, Robert Khasanov <rob.khasanov at gmail.com>
> wrote:
>
> You are not right about stores, they have only merge-masking version, not
> zero-masking.
>
>
> Great, there is typo in the SDM:
>
> VMOVDQA32 zmm2/m512 {k1}{z}, zmm1
>   Could you perhaps let the appropriate people know at Intel?
>
>
VMOVDQA32 also have register-register version and it has zero-masking.
Another form VMOVDQA32 zmm1 {k1}{z}, zmm2/m512 also have register-register
version but it has another opcode.
In section 4.4 of SDM: "Instructions which support “zeroing-masking" also
allow merging-masking".
So. there is no typo.

> As I understand you correctly:
> * maskable_custom_mask will have only merge-masking version
> * maskable_custom_maskz will have only zero-masking version
> * maskable_custom derives from maskable_custom_mask, and maskable_custom_maskz
> and include also non-masking version.
>
> In this plan I will need also add class that will derives from maskable_custom_mask
> and also have non-masking version.. I don't think that it's good
> hierarchy..
> Since there is no instructions w/zero-masking, I think deriving maskz from
> mask is more reasonable (as in my version).
>
>
> Yeah given that there does not seem to be an instruction with only
> zero-masking variant, you’re right.
>
> Also I want reduce the namings: not repeat mask in multiclass naming: maskz_custom
> instead of maskable_custom_maskz, mask_custom instead of
> maskable_custom_mask
>
>
> It’s confusing not to use something like maskable since this also defines
> the non-masking version.  That is why I don’t like the current name masking
> or mask.
>
> How about:
>
>                  merge_maskable_custom
>                         |
>                         |
>         maskable_custom
>                     /       \
>                    /         \
>           maskable_common   maskable_in_asm
>             /         \
>            /            \
>       maskable        maskable_3src
>
> where merge_maskable_custom only has non-masking and merge-masking and
> maskable_custom also adds zero-masking?
>
>
That is good hierarchy :-) I will try to use merge_masking_custom for CMP
soon.


> Adam
>
> 2014-10-08 21:05 GMT+04:00 Adam Nemet <anemet at apple.com>:
>
>>
>> On Oct 8, 2014, at 5:51 AM, Robert Khasanov <rob.khasanov at gmail.com>
>> wrote:
>>
>>
>>
>> 2014-10-08 11:12 GMT+04:00 Adam Nemet <anemet at apple.com>:
>>
>>>
>>> On Oct 3, 2014, at 11:34 AM, Robert Khasanov <rob.khasanov at gmail.com>
>>> wrote:
>>>
>>>
>>>
>>> 2014-10-01 21:38 GMT+04:00 Adam Nemet <anemet at apple.com>:
>>>
>>>>
>>>> On Oct 1, 2014, at 9:30 AM, Robert Khasanov <rob.khasanov at gmail.com>
>>>> wrote:
>>>>
>>>> Hi Adam,
>>>>
>>>> About maskings in vextract*x4 instructions. I think we can include VK4
>>>> register class and v4i1 type for AVX512F subset as it is needed by
>>>> instruction semantics. Elena, do you agree? It would be better keep all
>>>> intrinsics lowering in one place.
>>>>
>>>>
>>>> Hi Robert,
>>>>
>>>> I don’t think that a single intrinsic is worth the complication at this
>>>> point.  I agree that this should be the long term direction but for now
>>>> it’s probably easier to keep the current simple model in mind that AVX512f
>>>> only requires v8i1 and above.  So all the strangeness about v4i1 e.g. that
>>>> it cannot be casted directly from a scalar type, etc. are kept to AVX512vl.
>>>>
>>>> Once we figure all that out for AVX512vl we will know if it’s worth
>>>> bringing it into the AVX512f world (for the sake of a single intrinsic or
>>>> perhaps for some other reason I don’t yet know).
>>>>
>>>> So my preference would keep the asm-only masking ops for now and
>>>> experiment with v4i1 in AVX512f context at a later point.   Is that
>>>> acceptable?
>>>>
>>>>
>>>  I think yes. We can do it later.
>>>
>>>> Patches:
>>>>
>>>> 0001: LGTM, anyway. It should be committed even though we decided to
>>>> include VK4 register class for AVX512F subset, because it would be helpful
>>>> for other instructions (e.g. CMP that will override masking patterns)
>>>>
>>>>
>>>> Yeah, I was hoping you’d notice that ;).
>>>>
>>>> 0004: I don't understand why you use "AVX512_masking_asm_only" naming.
>>>> It's not obvious from this name that only non-masking pattern is used.
>>>>
>>>>
>>>> Do you have a better suggestion?  What I was trying to express that for
>>>> the masking variants we only provide asm support and no code-gen.  Is your
>>>> problem that the name is too masking focused, i.e. that for the non-masking
>>>> we do provide code-gen?
>>>>
>>>>
>>> I was confused that AVX512_masking_asm_only was derived from
>>> AVX512_masking_asm multiclass, and it is not obvious what does "only"
>>> difference mean.. In implementation I see that you provided only
>>> non-masking pattern.
>>>
>>>
>>> There should have been a comment as well :(.
>>>
>>> May be it would be better to provide AVX512_masking_manual (or another
>>> name) derived from AVX512_masking_asm that will have default values
>>> (list<dag> Pattern = [], list<dag> MaskingPattern = [], list<dag>
>>> ZeroMaskingPattern= []). What do you think?
>>>
>>>
>>> I don’t think that’s a good idea.  The input and output operands should
>>> be specified along with the patterns.  So for an instruction with a custom
>>> masking and zero-masking pattern (e.g. cmpm), you want to use
>>> AVX512_masking_asm (i.e. the class that lets you pass Ins, Outs, Pattern,
>>> Masking pattern and Zero-masking pattern.
>>>
>>> So perhaps the naming should be:
>>>
>>> 1. masking->maskable  (I think calling this multiclass masking was a
>>> mistake since it’s also provides the non-masking variant)
>>> 2. _asm -> _custom
>>> 3. _asm_only -> _in_asm
>>>
>>> So the hierarchy would be like this:
>>>
>>>         maskable_custom
>>>                     /       \
>>>                    /         \
>>>           maskable_common   maskable_in_asm
>>>             /         \
>>>            /            \
>>>       maskable        maskable_3src
>>>
>>> Let me know if that sounds more reasonable.
>>>
>>>
>> The hierarchy looks like reasonable. I suggest to change masking to
>> mask/maskz (similar to intel intrinsics name). So, AVX512_mask_custom
>> includes non-masking and masking versions, and AVX512_maskz_custom is
>> derived from AVX512_mask_custom and includes zero-masking version. Other
>> mask/maskz multiclasses are derived from corresponding mask/maskz parent
>> multiclasses.
>> AVX512_mask_custom is useful for instructions with only non-masking and
>> masking versions available (stores, cmp etc.)
>>
>>
>> If I understand you correctly, I rather not create such a bloated
>> hierarchy without existing users.  We can further split the classes if
>> necessary.  For example, I think for cmpm (which I think only has a masking
>> variant — actually implementing zero-masking semantics) we may want to
>> derive from maskable_custom_mask and for stores which only have
>> zero-masking variant we want maskable_custom_maskz.  Then maskable_custom
>> would just derive from both maskable_custom_mask and maskable_custom_maskz.
>>
>> Adam
>>
>> Adam
>>>
>>> Perhaps I should rename these classes to AVX512_maskable or something
>>>> and call this guy AVX512_maskable_asm_masking?
>>>>
>>>> Adam
>>>>
>>>> 2014-10-01 17:47 GMT+04:00 Demikhovsky, Elena <
>>>> elena.demikhovsky at intel.com>:
>>>>
>>>>> 0003 : LGTM
>>>>>
>>>>> -  Elena
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Adam Nemet [mailto:anemet at apple.com]
>>>>> Sent: Wednesday, October 01, 2014 08:57
>>>>> To: Demikhovsky, Elena; Robert Khasanov
>>>>> Cc: LLVM Commits
>>>>> Subject: [AVX512] Add masking variants for vextract*x4
>>>>>
>>>>> Hi Elena and Robert,
>>>>>
>>>>> First I tried to implement this with the usual intrinsic lowering but
>>>>> unfortunately because vextract*x4 produces v4* vectors I would have needed
>>>>> VK4 and v4i1 for masking in AVX512f which only come with AVX512vl.  I don't
>>>>> think these intrinsics are worth the hassle making that work so instead I
>>>>> went for asm-only instructions.  Then I have the intrinsics Pat<>s to call
>>>>> the corresponding instruction.
>>>>>
>>>>> This approach gave me a chance to incorporate asm-only masking support
>>>>> into the AVX512_masking hierarchy.  See patch 1 and 4.
>>>>>
>>>>> Also the masking operand was not supported with the MRMDestReg format,
>>>>> so I've added that too.  See patch 2.
>>>>>
>>>>> Please let me know if it looks good to you.
>>>>>
>>>>> Thanks,
>>>>> Adam
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> Intel Israel (74) Limited
>>>>>
>>>>> This e-mail and any attachments may contain confidential material for
>>>>> the sole use of the intended recipient(s). Any review or distribution
>>>>> by others is strictly prohibited. If you are not the intended
>>>>> recipient, please contact the sender and delete all copies.
>>>>>
>>>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141008/f497a9ad/attachment.html>