[PATCH][DAGCombiner] Teach how to combine a pair of shuffles into a single shuffle if the resulting mask is legal.

Sun Jul 13 13:15:37 PDT 2014

Hi Michael,

Thanks for reporting it.
I am looking at it right now.

On Sun, Jul 13, 2014 at 1:32 PM, Kuperstein, Michael M
<michael.m.kuperstein at intel.com> wrote:
> Hi Andrea,
>
> This patch breaks the X86 codegen in some cases, reproducer is attached.
>
> It seems that X86 assumes that isShuffleMaskLegal() is called only after type legalization.
> Calling it with illegal vector types (e.g. v4i8) causes it to crash and burn, because isSHUFPMask() doesn't check that VT.getSizeInBits() > 128, causing a division by 0.
> I could just add a check to isSHUFPMask(), but I see this pattern repeated in the other functions called from isShuffleMaskLegal() as well.
>
> Nadav, is calling isShuffleMaskLegal() from the DAG Combiner before type legalization expected to work?
> If it is, should it just return false when it encounters types it doesn't expect? Right now it returns false for VT.getSizeInBits() == 64, I guess we can change that to <= 64 instead.
>
> Michael
>
> -----Original Message-----
> From: llvm-commits-bounces at cs.uiuc.edu [mailto:llvm-commits-bounces at cs.uiuc.edu] On Behalf Of Andrea Di Biagio
> Sent: Tuesday, July 08, 2014 18:31
> To: Nadav Rotem
> Cc: llvm-commits at cs.uiuc.edu for LLVM
> Subject: Re: [PATCH][DAGCombiner] Teach how to combine a pair of shuffles into a single shuffle if the resulting mask is legal.
>
> Thanks for the review Nadav!
>
> Committed revision 212539.
>
>
> On Tue, Jul 8, 2014 at 4:09 PM, Nadav Rotem <nrotem at apple.com> wrote:
>>
>> Thanks for working on it Andrea. This change makes sense to me. I am happy to see that it helps so many cases. LGTM.
>>
>>> On Jul 8, 2014, at 07:51, Andrea Di Biagio <andrea.dibiagio at gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> this patch teaches the DAGCombiner how to fold shuffles according to rule:
>>>
>>>  (shuffle (shuffle (X, Undef, M0)), Undef, M1) -> (shuffle X, Undef,
>>> M2)
>>>
>>> We do this only if M2 is legal according to method 'isShuffleMaskLegal'.
>>>
>>> DAGCombiner already implemented some rules to simplify a sequence of
>>> shuffles into the vector operand according to rule (shuffle (shuffle
>>> (X, Undef, M0), Undef, M1) -> X
>>>
>>> This patch extends that logic to also cover the case where the two
>>> shuffles in input don't perform an identity operation. To avoid
>>> introducing illegal shuffles that are then expanded during
>>> Legalization into a sub-optimal sequence of instructions, the new
>>> rule would only trigger if if the mask of the resulting shufflevector
>>> is legal (this is done calling method 'TLI.isShuffleMaskLegal').
>>>
>>> In my experiments, this patch improves how shuffles are selected on
>>> x86 (see for example my new tests swizzle-2.ll and swizzle-avx2.ll).
>>> However, in most cases - related to 128-bit vector swizzle oeprations
>>> - the backend already knows how to efficiently combine PSHUFD dag
>>> nodes thanks to the new target combine rules added at revisions
>>> 211889 and 211890.
>>>
>>> In general, this patch has clearly the advantage of being target
>>> independent, since it works on target independent shuffle dag nodes.
>>> That means, potentially all targets (not only x86) can take advantage
>>> of this new rule.
>>> Second, most shuffle pairs can be safely combined before we run the
>>> legalizer on vector operations. This allows us to combine/simplify
>>> some dag nodes earlier in the process and not only immediately before
>>> instruction selection. That said, this patch is _not_ meant to be a
>>> replacement for the target specific combine rules already available
>>> on x86; the backend might still potentially introduce new shuffles
>>> during legalization stage. Another reason why on x86 we still need
>>> target specific combine rules is because this new rule is very simple
>>> and avoids to aggressively optimize shuffles; on the other hand,
>>> rules added at 211890 tends to be more aggressive since they know how
>>> to see through bitcast operations when folding shuffles.
>>>
>>> I only tested this patch on x86. I had a very quick look at the
>>> codegen for test swizzle-2.ll when passing an ARM triple (I tested
>>> only armv7-linux).
>>> I saw that the code generated is overall better with this patch (i.e.
>>> smaller - with less shuffles).
>>>
>>> Please let me know if ok to submit.
>>>
>>> Thanks,
>>> Andrea
>>> <patch-dagcombiner.diff>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.