[PATCH][DAGCombiner] Teach how to combine a pair of shuffles into a single shuffle if the resulting mask is legal.

Tue Jul 8 08:31:29 PDT 2014

Thanks for the review Nadav!

Committed revision 212539.

On Tue, Jul 8, 2014 at 4:09 PM, Nadav Rotem <nrotem at apple.com> wrote:
>
> Thanks for working on it Andrea. This change makes sense to me. I am happy to see that it helps so many cases. LGTM.
>
>> On Jul 8, 2014, at 07:51, Andrea Di Biagio <andrea.dibiagio at gmail.com> wrote:
>>
>> Hi,
>>
>> this patch teaches the DAGCombiner how to fold shuffles according to rule:
>>
>>  (shuffle (shuffle (X, Undef, M0)), Undef, M1) -> (shuffle X, Undef, M2)
>>
>> We do this only if M2 is legal according to method 'isShuffleMaskLegal'.
>>
>> DAGCombiner already implemented some rules to simplify a sequence of
>> shuffles into the vector operand according to rule
>> (shuffle (shuffle (X, Undef, M0), Undef, M1) -> X
>>
>> This patch extends that logic to also cover the case where the two
>> shuffles in input don't perform an identity operation. To avoid
>> introducing illegal shuffles that are then expanded during
>> Legalization into a sub-optimal sequence of instructions, the new rule
>> would only trigger if if the mask of the resulting shufflevector is
>> legal (this is done calling method 'TLI.isShuffleMaskLegal').
>>
>> In my experiments, this patch improves how shuffles are selected on
>> x86 (see for example my new tests swizzle-2.ll and swizzle-avx2.ll).
>> However, in most cases - related to 128-bit vector swizzle oeprations
>> - the backend already knows how to efficiently combine PSHUFD dag
>> nodes thanks to the new target combine rules added at revisions 211889
>> and 211890.
>>
>> In general, this patch has clearly the advantage of being target
>> independent, since it works on target independent shuffle dag nodes.
>> That means, potentially all targets (not only x86) can take advantage
>> of this new rule.
>> Second, most shuffle pairs can be safely combined before we run the
>> legalizer on vector operations. This allows us to combine/simplify
>> some dag nodes earlier in the process and not only immediately before
>> instruction selection. That said, this patch is _not_ meant to be a
>> replacement for the target specific combine rules already available on
>> x86; the backend might still potentially introduce new shuffles during
>> legalization stage. Another reason why on x86 we still need target
>> specific combine rules is because this new rule is very simple and
>> avoids to aggressively optimize shuffles; on the other hand, rules
>> added at 211890 tends to be more aggressive since they know how to see
>> through bitcast operations when folding shuffles.
>>
>> I only tested this patch on x86. I had a very quick look at the
>> codegen for test swizzle-2.ll when passing an ARM triple (I tested
>> only armv7-linux).
>> I saw that the code generated is overall better with this patch (i.e.
>> smaller - with less shuffles).
>>
>> Please let me know if ok to submit.
>>
>> Thanks,
>> Andrea
>> <patch-dagcombiner.diff>