[PATCH][DAGCombiner] Teach how to combine a pair of shuffles into a single shuffle if the resulting mask is legal.

Sun Jul 13 14:23:49 PDT 2014

Hi Michael.

Sorry for the problems caused by my commit..
I have just committed revision 212915 that I should fix the problems you found.
Could you please verify that it works for you as well?

As you said, the crash was caused by the lack of a check for legal types.
I wrongly assumed that 'isShuffleMaskLegal' was able to accept any
vector value types (even illegal types).. But I was wrong.
To fix the problem I explicitly added a check to verify if the shuffle
value type is legal before calling 'isShuffleMaskLegal'

Thanks for the reproducible and for the detailed analysis of the
problem found. I have added your reproducible as test case.

-Andrea

On Sun, Jul 13, 2014 at 9:15 PM, Andrea Di Biagio
<andrea.dibiagio at gmail.com> wrote:
> Hi Michael,
>
> Thanks for reporting it.
> I am looking at it right now.
>
>
> On Sun, Jul 13, 2014 at 1:32 PM, Kuperstein, Michael M
> <michael.m.kuperstein at intel.com> wrote:
>> Hi Andrea,
>>
>> This patch breaks the X86 codegen in some cases, reproducer is attached.
>>
>> It seems that X86 assumes that isShuffleMaskLegal() is called only after type legalization.
>> Calling it with illegal vector types (e.g. v4i8) causes it to crash and burn, because isSHUFPMask() doesn't check that VT.getSizeInBits() > 128, causing a division by 0.
>> I could just add a check to isSHUFPMask(), but I see this pattern repeated in the other functions called from isShuffleMaskLegal() as well.
>>
>> Nadav, is calling isShuffleMaskLegal() from the DAG Combiner before type legalization expected to work?
>> If it is, should it just return false when it encounters types it doesn't expect? Right now it returns false for VT.getSizeInBits() == 64, I guess we can change that to <= 64 instead.
>>
>> Michael
>>
>> -----Original Message-----
>> From: llvm-commits-bounces at cs.uiuc.edu [mailto:llvm-commits-bounces at cs.uiuc.edu] On Behalf Of Andrea Di Biagio
>> Sent: Tuesday, July 08, 2014 18:31
>> To: Nadav Rotem
>> Cc: llvm-commits at cs.uiuc.edu for LLVM
>> Subject: Re: [PATCH][DAGCombiner] Teach how to combine a pair of shuffles into a single shuffle if the resulting mask is legal.
>>
>> Thanks for the review Nadav!
>>
>> Committed revision 212539.
>>
>>
>> On Tue, Jul 8, 2014 at 4:09 PM, Nadav Rotem <nrotem at apple.com> wrote:
>>>
>>> Thanks for working on it Andrea. This change makes sense to me. I am happy to see that it helps so many cases. LGTM.
>>>
>>>> On Jul 8, 2014, at 07:51, Andrea Di Biagio <andrea.dibiagio at gmail.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> this patch teaches the DAGCombiner how to fold shuffles according to rule:
>>>>
>>>>  (shuffle (shuffle (X, Undef, M0)), Undef, M1) -> (shuffle X, Undef,
>>>> M2)
>>>>
>>>> We do this only if M2 is legal according to method 'isShuffleMaskLegal'.
>>>>
>>>> DAGCombiner already implemented some rules to simplify a sequence of
>>>> shuffles into the vector operand according to rule (shuffle (shuffle
>>>> (X, Undef, M0), Undef, M1) -> X
>>>>
>>>> This patch extends that logic to also cover the case where the two
>>>> shuffles in input don't perform an identity operation. To avoid
>>>> introducing illegal shuffles that are then expanded during
>>>> Legalization into a sub-optimal sequence of instructions, the new
>>>> rule would only trigger if if the mask of the resulting shufflevector
>>>> is legal (this is done calling method 'TLI.isShuffleMaskLegal').
>>>>
>>>> In my experiments, this patch improves how shuffles are selected on
>>>> x86 (see for example my new tests swizzle-2.ll and swizzle-avx2.ll).
>>>> However, in most cases - related to 128-bit vector swizzle oeprations
>>>> - the backend already knows how to efficiently combine PSHUFD dag
>>>> nodes thanks to the new target combine rules added at revisions
>>>> 211889 and 211890.
>>>>
>>>> In general, this patch has clearly the advantage of being target
>>>> independent, since it works on target independent shuffle dag nodes.
>>>> That means, potentially all targets (not only x86) can take advantage
>>>> of this new rule.
>>>> Second, most shuffle pairs can be safely combined before we run the
>>>> legalizer on vector operations. This allows us to combine/simplify
>>>> some dag nodes earlier in the process and not only immediately before
>>>> instruction selection. That said, this patch is _not_ meant to be a
>>>> replacement for the target specific combine rules already available
>>>> on x86; the backend might still potentially introduce new shuffles
>>>> during legalization stage. Another reason why on x86 we still need
>>>> target specific combine rules is because this new rule is very simple
>>>> and avoids to aggressively optimize shuffles; on the other hand,
>>>> rules added at 211890 tends to be more aggressive since they know how
>>>> to see through bitcast operations when folding shuffles.
>>>>
>>>> I only tested this patch on x86. I had a very quick look at the
>>>> codegen for test swizzle-2.ll when passing an ARM triple (I tested
>>>> only armv7-linux).
>>>> I saw that the code generated is overall better with this patch (i.e.
>>>> smaller - with less shuffles).
>>>>
>>>> Please let me know if ok to submit.
>>>>
>>>> Thanks,
>>>> Andrea
>>>> <patch-dagcombiner.diff>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>> ---------------------------------------------------------------------
>> Intel Israel (74) Limited
>>
>> This e-mail and any attachments may contain confidential material for
>> the sole use of the intended recipient(s). Any review or distribution
>> by others is strictly prohibited. If you are not the intended
>> recipient, please contact the sender and delete all copies.