[PATCH][DAGCombiner] Teach how to combine a pair of shuffles into a single shuffle if the resulting mask is legal.

Kuperstein, Michael M michael.m.kuperstein at intel.com
Sun Jul 13 05:32:51 PDT 2014


Hi Andrea,

This patch breaks the X86 codegen in some cases, reproducer is attached.

It seems that X86 assumes that isShuffleMaskLegal() is called only after type legalization. 
Calling it with illegal vector types (e.g. v4i8) causes it to crash and burn, because isSHUFPMask() doesn't check that VT.getSizeInBits() > 128, causing a division by 0.
I could just add a check to isSHUFPMask(), but I see this pattern repeated in the other functions called from isShuffleMaskLegal() as well.

Nadav, is calling isShuffleMaskLegal() from the DAG Combiner before type legalization expected to work? 
If it is, should it just return false when it encounters types it doesn't expect? Right now it returns false for VT.getSizeInBits() == 64, I guess we can change that to <= 64 instead.

Michael

-----Original Message-----
From: llvm-commits-bounces at cs.uiuc.edu [mailto:llvm-commits-bounces at cs.uiuc.edu] On Behalf Of Andrea Di Biagio
Sent: Tuesday, July 08, 2014 18:31
To: Nadav Rotem
Cc: llvm-commits at cs.uiuc.edu for LLVM
Subject: Re: [PATCH][DAGCombiner] Teach how to combine a pair of shuffles into a single shuffle if the resulting mask is legal.

Thanks for the review Nadav!

Committed revision 212539.


On Tue, Jul 8, 2014 at 4:09 PM, Nadav Rotem <nrotem at apple.com> wrote:
>
> Thanks for working on it Andrea. This change makes sense to me. I am happy to see that it helps so many cases. LGTM.
>
>> On Jul 8, 2014, at 07:51, Andrea Di Biagio <andrea.dibiagio at gmail.com> wrote:
>>
>> Hi,
>>
>> this patch teaches the DAGCombiner how to fold shuffles according to rule:
>>
>>  (shuffle (shuffle (X, Undef, M0)), Undef, M1) -> (shuffle X, Undef, 
>> M2)
>>
>> We do this only if M2 is legal according to method 'isShuffleMaskLegal'.
>>
>> DAGCombiner already implemented some rules to simplify a sequence of 
>> shuffles into the vector operand according to rule (shuffle (shuffle 
>> (X, Undef, M0), Undef, M1) -> X
>>
>> This patch extends that logic to also cover the case where the two 
>> shuffles in input don't perform an identity operation. To avoid 
>> introducing illegal shuffles that are then expanded during 
>> Legalization into a sub-optimal sequence of instructions, the new 
>> rule would only trigger if if the mask of the resulting shufflevector 
>> is legal (this is done calling method 'TLI.isShuffleMaskLegal').
>>
>> In my experiments, this patch improves how shuffles are selected on
>> x86 (see for example my new tests swizzle-2.ll and swizzle-avx2.ll).
>> However, in most cases - related to 128-bit vector swizzle oeprations
>> - the backend already knows how to efficiently combine PSHUFD dag 
>> nodes thanks to the new target combine rules added at revisions 
>> 211889 and 211890.
>>
>> In general, this patch has clearly the advantage of being target 
>> independent, since it works on target independent shuffle dag nodes.
>> That means, potentially all targets (not only x86) can take advantage 
>> of this new rule.
>> Second, most shuffle pairs can be safely combined before we run the 
>> legalizer on vector operations. This allows us to combine/simplify 
>> some dag nodes earlier in the process and not only immediately before 
>> instruction selection. That said, this patch is _not_ meant to be a 
>> replacement for the target specific combine rules already available 
>> on x86; the backend might still potentially introduce new shuffles 
>> during legalization stage. Another reason why on x86 we still need 
>> target specific combine rules is because this new rule is very simple 
>> and avoids to aggressively optimize shuffles; on the other hand, 
>> rules added at 211890 tends to be more aggressive since they know how 
>> to see through bitcast operations when folding shuffles.
>>
>> I only tested this patch on x86. I had a very quick look at the 
>> codegen for test swizzle-2.ll when passing an ARM triple (I tested 
>> only armv7-linux).
>> I saw that the code generated is overall better with this patch (i.e.
>> smaller - with less shuffles).
>>
>> Please let me know if ok to submit.
>>
>> Thanks,
>> Andrea
>> <patch-dagcombiner.diff>
_______________________________________________
llvm-commits mailing list
llvm-commits at cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bugpoint-reduced-simplified.ll
Type: application/octet-stream
Size: 662 bytes
Desc: bugpoint-reduced-simplified.ll
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140713/a466a703/attachment.obj>


More information about the llvm-commits mailing list