[PATCH][DAGCombiner] Teach how to combine a pair of shuffles into a single shuffle if the resulting mask is legal.

Nadav Rotem nrotem at apple.com
Tue Jul 8 08:09:09 PDT 2014


Thanks for working on it Andrea. This change makes sense to me. I am happy to see that it helps so many cases. LGTM. 

> On Jul 8, 2014, at 07:51, Andrea Di Biagio <andrea.dibiagio at gmail.com> wrote:
> 
> Hi,
> 
> this patch teaches the DAGCombiner how to fold shuffles according to rule:
> 
>  (shuffle (shuffle (X, Undef, M0)), Undef, M1) -> (shuffle X, Undef, M2)
> 
> We do this only if M2 is legal according to method 'isShuffleMaskLegal'.
> 
> DAGCombiner already implemented some rules to simplify a sequence of
> shuffles into the vector operand according to rule
> (shuffle (shuffle (X, Undef, M0), Undef, M1) -> X
> 
> This patch extends that logic to also cover the case where the two
> shuffles in input don't perform an identity operation. To avoid
> introducing illegal shuffles that are then expanded during
> Legalization into a sub-optimal sequence of instructions, the new rule
> would only trigger if if the mask of the resulting shufflevector is
> legal (this is done calling method 'TLI.isShuffleMaskLegal').
> 
> In my experiments, this patch improves how shuffles are selected on
> x86 (see for example my new tests swizzle-2.ll and swizzle-avx2.ll).
> However, in most cases - related to 128-bit vector swizzle oeprations
> - the backend already knows how to efficiently combine PSHUFD dag
> nodes thanks to the new target combine rules added at revisions 211889
> and 211890.
> 
> In general, this patch has clearly the advantage of being target
> independent, since it works on target independent shuffle dag nodes.
> That means, potentially all targets (not only x86) can take advantage
> of this new rule.
> Second, most shuffle pairs can be safely combined before we run the
> legalizer on vector operations. This allows us to combine/simplify
> some dag nodes earlier in the process and not only immediately before
> instruction selection. That said, this patch is _not_ meant to be a
> replacement for the target specific combine rules already available on
> x86; the backend might still potentially introduce new shuffles during
> legalization stage. Another reason why on x86 we still need target
> specific combine rules is because this new rule is very simple and
> avoids to aggressively optimize shuffles; on the other hand, rules
> added at 211890 tends to be more aggressive since they know how to see
> through bitcast operations when folding shuffles.
> 
> I only tested this patch on x86. I had a very quick look at the
> codegen for test swizzle-2.ll when passing an ARM triple (I tested
> only armv7-linux).
> I saw that the code generated is overall better with this patch (i.e.
> smaller - with less shuffles).
> 
> Please let me know if ok to submit.
> 
> Thanks,
> Andrea
> <patch-dagcombiner.diff>



More information about the llvm-commits mailing list