[PATCH] Teach the DAGCombiner how to fold a OR of two shufflevector into a single shufflevector node

Thu Mar 6 11:25:50 PST 2014

On Mar 6, 2014, at 10:55 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com> wrote:

> Hi Hal,
> thanks for your review!
> 
> Nadav, are you happy with this too?

Yes, it looks good to me. Thanks Andrea. 

> 
> On Thu, Mar 6, 2014 at 5:50 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>> ----- Original Message -----
>>> From: "Andrea Di Biagio" <andrea.dibiagio at gmail.com>
>>> To: "Tim Northover" <t.p.northover at gmail.com>
>>> Cc: "llvm-commits at cs.uiuc.edu for LLVM" <llvm-commits at cs.uiuc.edu>
>>> Sent: Thursday, March 6, 2014 8:25:22 AM
>>> Subject: Re: [PATCH] Teach the DAGCombiner how to fold a OR of two    shufflevector into a single shufflevector node
>>> 
>>> Hi Nadav, Hal and Tim,
>>> 
>>> I modified my original patch adding checks agains
>>> 'TLI.isShuffleMaskLegal' as suggested by Hal.
>>> This new patch now avoids folding an OR of two shuffles if the
>>> resulting shuffle would be illegal for the target.
>>> 
>>> The rules for this new combine are:
>>>   // fold (or (shuf A, V_0, MaskA), (shuf B, V_0, MaskB)) -> (shuf
>>>   A, B, Mask1)
>>>   // fold (or (shuf A, V_0, MaskA), (shuf B, V_0, MaskB)) -> (shuf
>>>   B, A, Mask2)
>>> 
>>> Basically:
>>> we can take advantage of the fact that OR is commutative and compute
>>> two different shuffle masks: Mask1, Mask2.
>>> If the dag node in input can be folded, then we check if Mask1 is a
>>> legal shuffle mask; in case it folds according to the first rule.
>>> Otherwise we checks if Mask2 is legal to see if we can fold according
>>> to the second rule.
>>> If both Masks are illegal then we conservatively don't fold the OR
>>> into a shuffle.
>> 
>> LGTM. Thanks!
>> 
>> -Hal
>> 
>>> 
>>> ///
>>> @Nadav
>>> In my experiments on the x86 (Corei7 with and without AVX support) I
>>> noticed that this new patch correctly folds all the interesting cases
>>> except this one:
>>> 
>>> define <4 x i32> @foo(<4 x i32> %a, <4 x i32> %b) {
>>>  %shuf1 = shufflevector <4 x i32> %a, <4 x i32> zeroinitializer, <4
>>>  x
>>> i32><i32 0, i32 4, i32 4, i32 4>
>>>  %shuf2 = shufflevector <4 x i32> %b, <4 x i32> zeroinitializer, <4
>>>  x
>>> i32><i32 4, i32 0, i32 3, i32 2>
>>>  %or = or <4 x i32> %shuf1, %shuf2
>>>  ret <4 x i32> %or
>>> }
>>> 
>>> The OR in function @foo is not folded because that would result in an
>>> illegal shuffle.
>>> However, that particular case could be still optimized into a
>>> sequence
>>> of two legal shuffle operations (getting rid of the OR entirely).
>>> Ideally we could generate the following assembly:
>>>    movlhps %xmm1, %xmm0
>>>    shufps $-71 %xmm1, %xmm0
>>> 
>>> The transformation would be valid in general if the target can emit
>>> (V)MOVLHPS instructions and if each shuffle in input to the OR has
>>> only one Use.
>>> If ok for you, once this patch is accepted I will send another patch
>>> to improve that particular case on the x86 only (adding a target
>>> specific combine).
>>> ///
>>> 
>>> Please let me know if this new patch is reasonable for you.
>>> 
>>> Thanks in advance!
>>> Andrea
>>> 
>>> 
>>> On Wed, Mar 5, 2014 at 9:37 PM, Tim Northover
>>> <t.p.northover at gmail.com> wrote:
>>>>> It is okay to generate new shuffles from a collection of
>>>>> insert/extracts
>>>>> because the new shuffle instructions can't be worse then a
>>>>> collection
>>>>> of inserts and extracts.
>>>> 
>>>> Thanks Nadav; it sounds like I should remove the TODO and comment
>>>> on
>>>> the scope of that function then as part of whatever I do.
>>>> 
>>>> Tim.
>>>> 
>>>> _______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>> 
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>> 
>> 
>> --
>> Hal Finkel
>> Assistant Computational Scientist
>> Leadership Computing Facility
>> Argonne National Laboratory
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits