[PATCH][InstCombine][X86] Improve the folding of calls to X86 packed shifts intrinsics.
Andrea Di Biagio
andrea.dibiagio at gmail.com
Thu May 8 02:18:00 PDT 2014
Thanks for the feedback!
I'll try to move these changes to the backend.
On 8 May 2014 03:18, "Jim Grosbach" <grosbach at apple.com> wrote:
> Hi Andrea,
> I’m really excited to see these patches continuing. Our vector codegen has
> been needing exactly this sort of detail oriented tuning for a long time
> These are both good improvements, but would be better as DAGCombines in
> the X86 backend. The main argument for doing these intrinsic combines at
> the IR level is when the input expression is likely to be split across
> multiple basic blocks by the time the backend sees it and would thus not be
> recognized by a DAG combiner. Both of these transforms should avoid that
> problem, though, and so can be dealt with there.
> On May 7, 2014, at 8:42 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com>
> > Hi,
> > This patch teaches InstCombine how to fold a packed SSE2/AVX2 shift
> > intrinsic into its first operand if the shift count is a zerovector
> > (i.e. a 'ConstantAggregateZero’).
> > Also, this patch teaches InstCombine how to lower a packed arithmetic
> > shift intrinsics into an 'ashr' instruction if the shift count is
> > known to be smaller than the vector element size.
> > Please let me know if ok to submit.
> > Thanks,
> > Andrea Di Biagio
> > SN Systems - Sony Computer Entertainment Group
> > <patch-instcombine-vshifts.diff>
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-commits