[llvm] r208342 - [X86] Add target specific combine rules to fold SSE2/AVX2 packed arithmetic shift intrinsics.
Tobias Grosser
tobias at grosser.es
Sat May 10 07:50:21 PDT 2014
On 08/05/2014 19:44, Andrea Di Biagio wrote:
> Author: adibiagio
> Date: Thu May 8 12:44:04 2014
> New Revision: 208342
>
> URL: http://llvm.org/viewvc/llvm-project?rev=208342&view=rev
> Log:
> [X86] Add target specific combine rules to fold SSE2/AVX2 packed arithmetic shift intrinsics.
>
> This patch teaches the backend how to combine packed SSE2/AVX2 arithmetic shift
> intrinsics.
>
> The rules are:
> - Always fold a packed arithmetic shift by zero to its first operand;
> - Convert a packed arithmetic shift intrinsic dag node into a ISD::SRA only if
> the shift count is known to be smaller than the vector element size.
>
> This patch also teaches to function 'getTargetVShiftByConstNode' how fold
> target specific vector shifts by zero.
>
> Added two new tests to verify that the DAGCombiner is able to fold
> sequences of SSE2/AVX2 packed arithmetic shift calls.
Hi Andrea,
I see a execution time regression from 3.4s up to 6.9 seconds on my -O3
buildbot for SingleSource/Benchmarks/Misc-C++/Large/sphereflake
http://llvm.org/perf/db_default/v4/nts/graph?plot.0=34.174.2&highlight_run=25587
between commits: 208335 and 208346
From a quick look through the commits I believe this is the commit that
most likely has caused this regression. Any idea if this change could
cause such an regression on a Intel(R) Xeon(R) CPU E5430 @ 2.66GHz system?
Cheers,
Tobias
More information about the llvm-commits
mailing list