[PATCH] Teach the DAGCombiner how to fold a OR of two shufflevector into a single shufflevector node

Thu Mar 6 09:50:41 PST 2014

----- Original Message -----
> From: "Andrea Di Biagio" <andrea.dibiagio at gmail.com>
> To: "Tim Northover" <t.p.northover at gmail.com>
> Cc: "llvm-commits at cs.uiuc.edu for LLVM" <llvm-commits at cs.uiuc.edu>
> Sent: Thursday, March 6, 2014 8:25:22 AM
> Subject: Re: [PATCH] Teach the DAGCombiner how to fold a OR of two	shufflevector into a single shufflevector node
> 
> Hi Nadav, Hal and Tim,
> 
> I modified my original patch adding checks agains
> 'TLI.isShuffleMaskLegal' as suggested by Hal.
> This new patch now avoids folding an OR of two shuffles if the
> resulting shuffle would be illegal for the target.
> 
> The rules for this new combine are:
>    // fold (or (shuf A, V_0, MaskA), (shuf B, V_0, MaskB)) -> (shuf
>    A, B, Mask1)
>    // fold (or (shuf A, V_0, MaskA), (shuf B, V_0, MaskB)) -> (shuf
>    B, A, Mask2)
> 
> Basically:
> we can take advantage of the fact that OR is commutative and compute
> two different shuffle masks: Mask1, Mask2.
> If the dag node in input can be folded, then we check if Mask1 is a
> legal shuffle mask; in case it folds according to the first rule.
> Otherwise we checks if Mask2 is legal to see if we can fold according
> to the second rule.
> If both Masks are illegal then we conservatively don't fold the OR
> into a shuffle.

LGTM. Thanks!

 -Hal

> 
> ///
> @Nadav
> In my experiments on the x86 (Corei7 with and without AVX support) I
> noticed that this new patch correctly folds all the interesting cases
> except this one:
> 
> define <4 x i32> @foo(<4 x i32> %a, <4 x i32> %b) {
>   %shuf1 = shufflevector <4 x i32> %a, <4 x i32> zeroinitializer, <4
>   x
> i32><i32 0, i32 4, i32 4, i32 4>
>   %shuf2 = shufflevector <4 x i32> %b, <4 x i32> zeroinitializer, <4
>   x
> i32><i32 4, i32 0, i32 3, i32 2>
>   %or = or <4 x i32> %shuf1, %shuf2
>   ret <4 x i32> %or
> }
> 
> The OR in function @foo is not folded because that would result in an
> illegal shuffle.
> However, that particular case could be still optimized into a
> sequence
> of two legal shuffle operations (getting rid of the OR entirely).
> Ideally we could generate the following assembly:
>     movlhps %xmm1, %xmm0
>     shufps $-71 %xmm1, %xmm0
> 
> The transformation would be valid in general if the target can emit
> (V)MOVLHPS instructions and if each shuffle in input to the OR has
> only one Use.
> If ok for you, once this patch is accepted I will send another patch
> to improve that particular case on the x86 only (adding a target
> specific combine).
> ///
> 
> Please let me know if this new patch is reasonable for you.
> 
> Thanks in advance!
> Andrea
> 
> 
> On Wed, Mar 5, 2014 at 9:37 PM, Tim Northover
> <t.p.northover at gmail.com> wrote:
> >> It is okay to generate new shuffles from a collection of
> >> insert/extracts
> >> because the new shuffle instructions can't be worse then a
> >> collection
> >> of inserts and extracts.
> >
> > Thanks Nadav; it sounds like I should remove the TODO and comment
> > on
> > the scope of that function then as part of whatever I do.
> >
> > Tim.
> >
> > _______________________________________________
> > llvm-commits mailing list
> > llvm-commits at cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory