[PATCH] Teach the DAGCombiner how to fold a OR of two shufflevector into a single shufflevector node

Thu Mar 6 12:26:24 PST 2014

Thanks.
Committed revision 203156.

On Thu, Mar 6, 2014 at 7:25 PM, Nadav Rotem <nrotem at apple.com> wrote:
>
> On Mar 6, 2014, at 10:55 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com> wrote:
>
>> Hi Hal,
>> thanks for your review!
>>
>> Nadav, are you happy with this too?
>
> Yes, it looks good to me. Thanks Andrea.
>
>>
>> On Thu, Mar 6, 2014 at 5:50 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>>> ----- Original Message -----
>>>> From: "Andrea Di Biagio" <andrea.dibiagio at gmail.com>
>>>> To: "Tim Northover" <t.p.northover at gmail.com>
>>>> Cc: "llvm-commits at cs.uiuc.edu for LLVM" <llvm-commits at cs.uiuc.edu>
>>>> Sent: Thursday, March 6, 2014 8:25:22 AM
>>>> Subject: Re: [PATCH] Teach the DAGCombiner how to fold a OR of two    shufflevector into a single shufflevector node
>>>>
>>>> Hi Nadav, Hal and Tim,
>>>>
>>>> I modified my original patch adding checks agains
>>>> 'TLI.isShuffleMaskLegal' as suggested by Hal.
>>>> This new patch now avoids folding an OR of two shuffles if the
>>>> resulting shuffle would be illegal for the target.
>>>>
>>>> The rules for this new combine are:
>>>>   // fold (or (shuf A, V_0, MaskA), (shuf B, V_0, MaskB)) -> (shuf
>>>>   A, B, Mask1)
>>>>   // fold (or (shuf A, V_0, MaskA), (shuf B, V_0, MaskB)) -> (shuf
>>>>   B, A, Mask2)
>>>>
>>>> Basically:
>>>> we can take advantage of the fact that OR is commutative and compute
>>>> two different shuffle masks: Mask1, Mask2.
>>>> If the dag node in input can be folded, then we check if Mask1 is a
>>>> legal shuffle mask; in case it folds according to the first rule.
>>>> Otherwise we checks if Mask2 is legal to see if we can fold according
>>>> to the second rule.
>>>> If both Masks are illegal then we conservatively don't fold the OR
>>>> into a shuffle.
>>>
>>> LGTM. Thanks!
>>>
>>> -Hal
>>>
>>>>
>>>> ///
>>>> @Nadav
>>>> In my experiments on the x86 (Corei7 with and without AVX support) I
>>>> noticed that this new patch correctly folds all the interesting cases
>>>> except this one:
>>>>
>>>> define <4 x i32> @foo(<4 x i32> %a, <4 x i32> %b) {
>>>>  %shuf1 = shufflevector <4 x i32> %a, <4 x i32> zeroinitializer, <4
>>>>  x
>>>> i32><i32 0, i32 4, i32 4, i32 4>
>>>>  %shuf2 = shufflevector <4 x i32> %b, <4 x i32> zeroinitializer, <4
>>>>  x
>>>> i32><i32 4, i32 0, i32 3, i32 2>
>>>>  %or = or <4 x i32> %shuf1, %shuf2
>>>>  ret <4 x i32> %or
>>>> }
>>>>
>>>> The OR in function @foo is not folded because that would result in an
>>>> illegal shuffle.
>>>> However, that particular case could be still optimized into a
>>>> sequence
>>>> of two legal shuffle operations (getting rid of the OR entirely).
>>>> Ideally we could generate the following assembly:
>>>>    movlhps %xmm1, %xmm0
>>>>    shufps $-71 %xmm1, %xmm0
>>>>
>>>> The transformation would be valid in general if the target can emit
>>>> (V)MOVLHPS instructions and if each shuffle in input to the OR has
>>>> only one Use.
>>>> If ok for you, once this patch is accepted I will send another patch
>>>> to improve that particular case on the x86 only (adding a target
>>>> specific combine).
>>>> ///
>>>>
>>>> Please let me know if this new patch is reasonable for you.
>>>>
>>>> Thanks in advance!
>>>> Andrea
>>>>
>>>>
>>>> On Wed, Mar 5, 2014 at 9:37 PM, Tim Northover
>>>> <t.p.northover at gmail.com> wrote:
>>>>>> It is okay to generate new shuffles from a collection of
>>>>>> insert/extracts
>>>>>> because the new shuffle instructions can't be worse then a
>>>>>> collection
>>>>>> of inserts and extracts.
>>>>>
>>>>> Thanks Nadav; it sounds like I should remove the TODO and comment
>>>>> on
>>>>> the scope of that function then as part of whatever I do.
>>>>>
>>>>> Tim.
>>>>>
>>>>> _______________________________________________
>>>>> llvm-commits mailing list
>>>>> llvm-commits at cs.uiuc.edu
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>
>>>> _______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>
>>>
>>> --
>>> Hal Finkel
>>> Assistant Computational Scientist
>>> Leadership Computing Facility
>>> Argonne National Laboratory
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>