[PATCH][InstCombine][X86] Improve the folding of calls to X86 packed shifts intrinsics.
grosbach at apple.com
Thu May 8 10:34:20 PDT 2014
On May 8, 2014, at 7:28 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com> wrote:
> Hi Jim,
> As you suggested, here is a patch that teaches the x86 backend how to
> combine/fold packed SSE2/AVX2 arithmetic shifts intrinsic dag nodes.
> The logic is basically the same; we always fold shifts-by-zero into
> the first operand; we lower arithmetic shift intrinsic dag nodes into
> a ISD::SRA dag nodes only if the shift count is known to be smaller
> than the vector element size. The latter is to take advantage of the
> existing target independent combine rules in DAGCombiner.
> I also added a if-stmt in function 'getTargetVShiftByConstNode' to
> check if we have a shift-by-zero and trivially fold it into the first
> I added two new tests to verify that DAGCombiner is able to fold
> sequences of sse2/avx2 packed arithmetic shift calls.
> Please let me know if ok to submit.
> On Thu, May 8, 2014 at 10:18 AM, Andrea Di Biagio
> <andrea.dibiagio at gmail.com> wrote:
>> Hi Jim,
>> Thanks for the feedback!
>> I'll try to move these changes to the backend.
>> On 8 May 2014 03:18, "Jim Grosbach" <grosbach at apple.com> wrote:
>>> Hi Andrea,
>>> I’m really excited to see these patches continuing. Our vector codegen has
>>> been needing exactly this sort of detail oriented tuning for a long time
>>> These are both good improvements, but would be better as DAGCombines in
>>> the X86 backend. The main argument for doing these intrinsic combines at the
>>> IR level is when the input expression is likely to be split across multiple
>>> basic blocks by the time the backend sees it and would thus not be
>>> recognized by a DAG combiner. Both of these transforms should avoid that
>>> problem, though, and so can be dealt with there.
>>> On May 7, 2014, at 8:42 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com>
>>>> This patch teaches InstCombine how to fold a packed SSE2/AVX2 shift
>>>> intrinsic into its first operand if the shift count is a zerovector
>>>> (i.e. a 'ConstantAggregateZero’).
>>>> Also, this patch teaches InstCombine how to lower a packed arithmetic
>>>> shift intrinsics into an 'ashr' instruction if the shift count is
>>>> known to be smaller than the vector element size.
>>>> Please let me know if ok to submit.
>>>> Andrea Di Biagio
>>>> SN Systems - Sony Computer Entertainment Group
More information about the llvm-commits