[PATCH][DAGCombiner] Teach how to combine a pair of shuffles into a single shuffle if the resulting mask is legal.

Mon Jul 14 05:03:59 PDT 2014

Looks good, thanks!

-----Original Message-----
From: Andrea Di Biagio [mailto:andrea.dibiagio at gmail.com] 
Sent: Monday, July 14, 2014 00:24
To: Kuperstein, Michael M
Cc: Nadav Rotem; Demikhovsky, Elena; llvm-commits at cs.uiuc.edu for LLVM
Subject: Re: [PATCH][DAGCombiner] Teach how to combine a pair of shuffles into a single shuffle if the resulting mask is legal.

Hi Michael.

Sorry for the problems caused by my commit..
I have just committed revision 212915 that I should fix the problems you found.
Could you please verify that it works for you as well?

As you said, the crash was caused by the lack of a check for legal types.
I wrongly assumed that 'isShuffleMaskLegal' was able to accept any vector value types (even illegal types).. But I was wrong.
To fix the problem I explicitly added a check to verify if the shuffle value type is legal before calling 'isShuffleMaskLegal'

Thanks for the reproducible and for the detailed analysis of the problem found. I have added your reproducible as test case.

-Andrea

On Sun, Jul 13, 2014 at 9:15 PM, Andrea Di Biagio <andrea.dibiagio at gmail.com> wrote:
> Hi Michael,
>
> Thanks for reporting it.
> I am looking at it right now.
>
>
> On Sun, Jul 13, 2014 at 1:32 PM, Kuperstein, Michael M 
> <michael.m.kuperstein at intel.com> wrote:
>> Hi Andrea,
>>
>> This patch breaks the X86 codegen in some cases, reproducer is attached.
>>
>> It seems that X86 assumes that isShuffleMaskLegal() is called only after type legalization.
>> Calling it with illegal vector types (e.g. v4i8) causes it to crash and burn, because isSHUFPMask() doesn't check that VT.getSizeInBits() > 128, causing a division by 0.
>> I could just add a check to isSHUFPMask(), but I see this pattern repeated in the other functions called from isShuffleMaskLegal() as well.
>>
>> Nadav, is calling isShuffleMaskLegal() from the DAG Combiner before type legalization expected to work?
>> If it is, should it just return false when it encounters types it doesn't expect? Right now it returns false for VT.getSizeInBits() == 64, I guess we can change that to <= 64 instead.
>>
>> Michael
>>
>> -----Original Message-----
>> From: llvm-commits-bounces at cs.uiuc.edu 
>> [mailto:llvm-commits-bounces at cs.uiuc.edu] On Behalf Of Andrea Di 
>> Biagio
>> Sent: Tuesday, July 08, 2014 18:31
>> To: Nadav Rotem
>> Cc: llvm-commits at cs.uiuc.edu for LLVM
>> Subject: Re: [PATCH][DAGCombiner] Teach how to combine a pair of shuffles into a single shuffle if the resulting mask is legal.
>>
>> Thanks for the review Nadav!
>>
>> Committed revision 212539.
>>
>>
>> On Tue, Jul 8, 2014 at 4:09 PM, Nadav Rotem <nrotem at apple.com> wrote:
>>>
>>> Thanks for working on it Andrea. This change makes sense to me. I am happy to see that it helps so many cases. LGTM.
>>>
>>>> On Jul 8, 2014, at 07:51, Andrea Di Biagio <andrea.dibiagio at gmail.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> this patch teaches the DAGCombiner how to fold shuffles according to rule:
>>>>
>>>>  (shuffle (shuffle (X, Undef, M0)), Undef, M1) -> (shuffle X, 
>>>> Undef,
>>>> M2)
>>>>
>>>> We do this only if M2 is legal according to method 'isShuffleMaskLegal'.
>>>>
>>>> DAGCombiner already implemented some rules to simplify a sequence 
>>>> of shuffles into the vector operand according to rule (shuffle 
>>>> (shuffle (X, Undef, M0), Undef, M1) -> X
>>>>
>>>> This patch extends that logic to also cover the case where the two 
>>>> shuffles in input don't perform an identity operation. To avoid 
>>>> introducing illegal shuffles that are then expanded during 
>>>> Legalization into a sub-optimal sequence of instructions, the new 
>>>> rule would only trigger if if the mask of the resulting 
>>>> shufflevector is legal (this is done calling method 'TLI.isShuffleMaskLegal').
>>>>
>>>> In my experiments, this patch improves how shuffles are selected on
>>>> x86 (see for example my new tests swizzle-2.ll and swizzle-avx2.ll).
>>>> However, in most cases - related to 128-bit vector swizzle 
>>>> oeprations
>>>> - the backend already knows how to efficiently combine PSHUFD dag 
>>>> nodes thanks to the new target combine rules added at revisions
>>>> 211889 and 211890.
>>>>
>>>> In general, this patch has clearly the advantage of being target 
>>>> independent, since it works on target independent shuffle dag nodes.
>>>> That means, potentially all targets (not only x86) can take 
>>>> advantage of this new rule.
>>>> Second, most shuffle pairs can be safely combined before we run the 
>>>> legalizer on vector operations. This allows us to combine/simplify 
>>>> some dag nodes earlier in the process and not only immediately 
>>>> before instruction selection. That said, this patch is _not_ meant 
>>>> to be a replacement for the target specific combine rules already 
>>>> available on x86; the backend might still potentially introduce new 
>>>> shuffles during legalization stage. Another reason why on x86 we 
>>>> still need target specific combine rules is because this new rule 
>>>> is very simple and avoids to aggressively optimize shuffles; on the 
>>>> other hand, rules added at 211890 tends to be more aggressive since 
>>>> they know how to see through bitcast operations when folding shuffles.
>>>>
>>>> I only tested this patch on x86. I had a very quick look at the 
>>>> codegen for test swizzle-2.ll when passing an ARM triple (I tested 
>>>> only armv7-linux).
>>>> I saw that the code generated is overall better with this patch (i.e.
>>>> smaller - with less shuffles).
>>>>
>>>> Please let me know if ok to submit.
>>>>
>>>> Thanks,
>>>> Andrea
>>>> <patch-dagcombiner.diff>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>> ---------------------------------------------------------------------
>> Intel Israel (74) Limited
>>
>> This e-mail and any attachments may contain confidential material for 
>> the sole use of the intended recipient(s). Any review or distribution 
>> by others is strictly prohibited. If you are not the intended 
>> recipient, please contact the sender and delete all copies.
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.