[PATCH][InstCombine][X86] Improve the folding of calls to X86 packed shifts intrinsics.
grosbach at apple.com
Wed May 7 19:16:59 PDT 2014
I’m really excited to see these patches continuing. Our vector codegen has been needing exactly this sort of detail oriented tuning for a long time now.
These are both good improvements, but would be better as DAGCombines in the X86 backend. The main argument for doing these intrinsic combines at the IR level is when the input expression is likely to be split across multiple basic blocks by the time the backend sees it and would thus not be recognized by a DAG combiner. Both of these transforms should avoid that problem, though, and so can be dealt with there.
On May 7, 2014, at 8:42 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com> wrote:
> This patch teaches InstCombine how to fold a packed SSE2/AVX2 shift
> intrinsic into its first operand if the shift count is a zerovector
> (i.e. a 'ConstantAggregateZero’).
> Also, this patch teaches InstCombine how to lower a packed arithmetic
> shift intrinsics into an 'ashr' instruction if the shift count is
> known to be smaller than the vector element size.
> Please let me know if ok to submit.
> Andrea Di Biagio
> SN Systems - Sony Computer Entertainment Group
More information about the llvm-commits