[PATCH][DAGCombiner] Teach how to combine a pair of shuffles into a single shuffle if the resulting mask is legal.

Mon Jul 14 09:19:37 PDT 2014

Andrea, Michael, 

It may be a good idea to run llvm-stress and check if there are new failures in the X86 backend.  At some point the X86 backend was really good about not crashing with random inputs. 

Thanks,
Nadav

> On Jul 13, 2014, at 2:23 PM, Andrea Di Biagio <andrea.dibiagio at gmail.com> wrote:
> 
> Hi Michael.
> 
> Sorry for the problems caused by my commit..
> I have just committed revision 212915 that I should fix the problems you found.
> Could you please verify that it works for you as well?
> 
> As you said, the crash was caused by the lack of a check for legal types.
> I wrongly assumed that 'isShuffleMaskLegal' was able to accept any
> vector value types (even illegal types).. But I was wrong.
> To fix the problem I explicitly added a check to verify if the shuffle
> value type is legal before calling 'isShuffleMaskLegal'
> 
> Thanks for the reproducible and for the detailed analysis of the
> problem found. I have added your reproducible as test case.
> 
> -Andrea
> 
> 
> On Sun, Jul 13, 2014 at 9:15 PM, Andrea Di Biagio
> <andrea.dibiagio at gmail.com> wrote:
>> Hi Michael,
>> 
>> Thanks for reporting it.
>> I am looking at it right now.
>> 
>> 
>> On Sun, Jul 13, 2014 at 1:32 PM, Kuperstein, Michael M
>> <michael.m.kuperstein at intel.com> wrote:
>>> Hi Andrea,
>>> 
>>> This patch breaks the X86 codegen in some cases, reproducer is attached.
>>> 
>>> It seems that X86 assumes that isShuffleMaskLegal() is called only after type legalization.
>>> Calling it with illegal vector types (e.g. v4i8) causes it to crash and burn, because isSHUFPMask() doesn't check that VT.getSizeInBits() > 128, causing a division by 0.
>>> I could just add a check to isSHUFPMask(), but I see this pattern repeated in the other functions called from isShuffleMaskLegal() as well.
>>> 
>>> Nadav, is calling isShuffleMaskLegal() from the DAG Combiner before type legalization expected to work?
>>> If it is, should it just return false when it encounters types it doesn't expect? Right now it returns false for VT.getSizeInBits() == 64, I guess we can change that to <= 64 instead.
>>> 
>>> Michael
>>> 
>>> -----Original Message-----
>>> From: llvm-commits-bounces at cs.uiuc.edu [mailto:llvm-commits-bounces at cs.uiuc.edu] On Behalf Of Andrea Di Biagio
>>> Sent: Tuesday, July 08, 2014 18:31
>>> To: Nadav Rotem
>>> Cc: llvm-commits at cs.uiuc.edu for LLVM
>>> Subject: Re: [PATCH][DAGCombiner] Teach how to combine a pair of shuffles into a single shuffle if the resulting mask is legal.
>>> 
>>> Thanks for the review Nadav!
>>> 
>>> Committed revision 212539.
>>> 
>>> 
>>> On Tue, Jul 8, 2014 at 4:09 PM, Nadav Rotem <nrotem at apple.com> wrote:
>>>> 
>>>> Thanks for working on it Andrea. This change makes sense to me. I am happy to see that it helps so many cases. LGTM.
>>>> 
>>>>> On Jul 8, 2014, at 07:51, Andrea Di Biagio <andrea.dibiagio at gmail.com> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> this patch teaches the DAGCombiner how to fold shuffles according to rule:
>>>>> 
>>>>> (shuffle (shuffle (X, Undef, M0)), Undef, M1) -> (shuffle X, Undef,
>>>>> M2)
>>>>> 
>>>>> We do this only if M2 is legal according to method 'isShuffleMaskLegal'.
>>>>> 
>>>>> DAGCombiner already implemented some rules to simplify a sequence of
>>>>> shuffles into the vector operand according to rule (shuffle (shuffle
>>>>> (X, Undef, M0), Undef, M1) -> X
>>>>> 
>>>>> This patch extends that logic to also cover the case where the two
>>>>> shuffles in input don't perform an identity operation. To avoid
>>>>> introducing illegal shuffles that are then expanded during
>>>>> Legalization into a sub-optimal sequence of instructions, the new
>>>>> rule would only trigger if if the mask of the resulting shufflevector
>>>>> is legal (this is done calling method 'TLI.isShuffleMaskLegal').
>>>>> 
>>>>> In my experiments, this patch improves how shuffles are selected on
>>>>> x86 (see for example my new tests swizzle-2.ll and swizzle-avx2.ll).
>>>>> However, in most cases - related to 128-bit vector swizzle oeprations
>>>>> - the backend already knows how to efficiently combine PSHUFD dag
>>>>> nodes thanks to the new target combine rules added at revisions
>>>>> 211889 and 211890.
>>>>> 
>>>>> In general, this patch has clearly the advantage of being target
>>>>> independent, since it works on target independent shuffle dag nodes.
>>>>> That means, potentially all targets (not only x86) can take advantage
>>>>> of this new rule.
>>>>> Second, most shuffle pairs can be safely combined before we run the
>>>>> legalizer on vector operations. This allows us to combine/simplify
>>>>> some dag nodes earlier in the process and not only immediately before
>>>>> instruction selection. That said, this patch is _not_ meant to be a
>>>>> replacement for the target specific combine rules already available
>>>>> on x86; the backend might still potentially introduce new shuffles
>>>>> during legalization stage. Another reason why on x86 we still need
>>>>> target specific combine rules is because this new rule is very simple
>>>>> and avoids to aggressively optimize shuffles; on the other hand,
>>>>> rules added at 211890 tends to be more aggressive since they know how
>>>>> to see through bitcast operations when folding shuffles.
>>>>> 
>>>>> I only tested this patch on x86. I had a very quick look at the
>>>>> codegen for test swizzle-2.ll when passing an ARM triple (I tested
>>>>> only armv7-linux).
>>>>> I saw that the code generated is overall better with this patch (i.e.
>>>>> smaller - with less shuffles).
>>>>> 
>>>>> Please let me know if ok to submit.
>>>>> 
>>>>> Thanks,
>>>>> Andrea
>>>>> <patch-dagcombiner.diff>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>> ---------------------------------------------------------------------
>>> Intel Israel (74) Limited
>>> 
>>> This e-mail and any attachments may contain confidential material for
>>> the sole use of the intended recipient(s). Any review or distribution
>>> by others is strictly prohibited. If you are not the intended
>>> recipient, please contact the sender and delete all copies.