[PATCH][InstCombine][X86] Improve the folding of calls to X86 packed shifts intrinsics.

Thu May 8 10:52:44 PDT 2014

Thanks Jim!
Committed revision 208342.

On Thu, May 8, 2014 at 6:34 PM, Jim Grosbach <grosbach at apple.com> wrote:
> LGTM. Thanks!
>
> On May 8, 2014, at 7:28 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com> wrote:
>
>> Hi Jim,
>>
>> As you suggested, here is a patch that teaches the x86 backend how to
>> combine/fold packed SSE2/AVX2 arithmetic shifts intrinsic dag nodes.
>> The logic is basically the same; we always fold shifts-by-zero into
>> the first operand; we lower arithmetic shift intrinsic dag nodes into
>> a ISD::SRA dag nodes only if the shift count is known to be smaller
>> than the vector element size. The latter is to take advantage of the
>> existing target independent combine rules in DAGCombiner.
>>
>> I also added a if-stmt in function 'getTargetVShiftByConstNode' to
>> check if we have a shift-by-zero and trivially fold it into the first
>> operand.
>> We
>> I added two new tests to verify that DAGCombiner is able to fold
>> sequences of sse2/avx2 packed arithmetic shift calls.
>>
>> Please let me know if ok to submit.
>>
>> Thanks!
>> Andrea
>>
>> On Thu, May 8, 2014 at 10:18 AM, Andrea Di Biagio
>> <andrea.dibiagio at gmail.com> wrote:
>>> Hi Jim,
>>> Thanks for the feedback!
>>>
>>> I'll try to move these changes to the backend.
>>>
>>> Cheers,
>>> Andrea
>>>
>>> On 8 May 2014 03:18, "Jim Grosbach" <grosbach at apple.com> wrote:
>>>>
>>>> Hi Andrea,
>>>>
>>>> I’m really excited to see these patches continuing. Our vector codegen has
>>>> been needing exactly this sort of detail oriented tuning for a long time
>>>> now.
>>>>
>>>> These are both good improvements, but would be better as DAGCombines in
>>>> the X86 backend. The main argument for doing these intrinsic combines at the
>>>> IR level is when the input expression is likely to be split across multiple
>>>> basic blocks by the time the backend sees it and would thus not be
>>>> recognized by a DAG combiner. Both of these transforms should avoid that
>>>> problem, though, and so can be dealt with there.
>>>>
>>>> -Jim
>>>>
>>>> On May 7, 2014, at 8:42 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> This patch teaches InstCombine how to fold a packed SSE2/AVX2 shift
>>>>> intrinsic into its first operand if the shift count is a zerovector
>>>>> (i.e. a 'ConstantAggregateZero’).
>>>>> Also, this patch teaches InstCombine how to lower a packed arithmetic
>>>>> shift intrinsics into an 'ashr' instruction if the shift count is
>>>>> known to be smaller than the vector element size.
>>>>>
>>>>> Please let me know if ok to submit.
>>>>>
>>>>> Thanks,
>>>>> Andrea Di Biagio
>>>>> SN Systems - Sony Computer Entertainment Group
>>>>> <patch-instcombine-vshifts.diff>
>>>>
>>>
>> <patch-target-combine-shift-intrins.diff>
>