[X86][ExeDepsFix] Add broadcast instruction to the table used by ExeDepsFix pass
Quentin Colombet
qcolombet at apple.com
Tue Mar 25 16:34:38 PDT 2014
Hi Nadav,
Here is a patch that adds the different broadcast instructions to the ReplaceableInstrsAVX2 table.
That way the ExeDepsFix pass can take better decisions when AVX2 broadcasts are across domain (int <-> float).
In particular, prior to this patch we were generating:
vpbroadcastd LCPI1_0(%rip), %ymm2
vpand %ymm2, %ymm0, %ymm0
vmaxps %ymm1, %ymm0, %ymm0 ## <- domain change penalty
Now, we generate the following nice sequence where everything is in the float domain:
vbroadcastss LCPI1_0(%rip), %ymm2
vandps %ymm2, %ymm0, %ymm0
vmaxps %ymm1, %ymm0, %ymm0
This patch gives a few speed-ups across the llvm test-suite + spec (-O3 -march=core-avx2).
I have reported here only the tests that show a difference in the disassembly and that run for more than 1s to avoid noise.
Benchmark_ID Reference Test Speedup Percent
-------------------------------------------------------------------------------
ASCI_Purple/SMG2000/smg2000 1.5325 1.5386 1 +0%
CINT2000/181.mcf/181.mcf 3.7871 3.7725 1 +0%
CINT2006/401.bzip2/401.bzip2 1.5876 1.5811 1 +0%
CINT2006/456.hmmer/456.hmmer 1.9414 1.941 1 +0%
Misc/salsa20 4.7333 4.7319 1 +0%
PAQ8p/paq8p 28.187 28.2113 1 +0%
Polybench/linear-algebra/kernels/sy 11.5787 11.5834 1 +0%
Polybench/linear-algebra/kernels/sy 2.9281 2.9271 1 +0%
TSVC/ControlFlow-dbl/ControlFlow-db 2.589 2.5916 1 +0%
TSVC/ControlFlow-flt/ControlFlow-fl 2.2556 2.1948 1.03 +3%
TSVC/ControlLoops-flt/ControlLoops- 1.6671 1.6693 1 +0%
TSVC/Equivalencing-dbl/Equivalencin 1.5151 1.4524 1.04 +4%
TSVC/Expansion-dbl/Expansion-dbl 2.483 2.4818 1 +0%
TSVC/IndirectAddressing-flt/Indirec 2.2959 2.2353 1.03 +3%
TSVC/InductionVariable-dbl/Inductio 2.7742 2.7731 1 +0%
TSVC/LinearDependence-dbl/LinearDep 2.2146 2.2224 1 +0%
TSVC/LinearDependence-flt/LinearDep 1.5389 1.5225 1.01 +1%
TSVC/NodeSplitting-dbl/NodeSplittin 2.4763 2.4753 1 +0%
TSVC/Packing-dbl/Packing-dbl 2.595 2.5093 1.03 +3%
TSVC/Packing-flt/Packing-flt 2.2906 2.2796 1 +0%
TSVC/Searching-dbl/Searching-dbl 2.9092 2.8992 1 +0%
TSVC/Searching-flt/Searching-flt 2.893 2.8579 1.01 +1%
TSVC/StatementReordering-dbl/Statem 2.5175 2.5113 1 +0%
nbench/nbench 5.2255 5.2213 1 +0%
sqlite3/sqlite3 2.113 2.1165 1 +0%
-------------------------------------------------------------------------------
Min (25) - - 1 -
-------------------------------------------------------------------------------
Max (25) - - 1.04 -
-------------------------------------------------------------------------------
Sum (25) 99 98 1 +0%
-------------------------------------------------------------------------------
A.Mean (25) - - 1.01 +1%
-------------------------------------------------------------------------------
G.Mean 2 (25) - - 1.01 +1%
-------------------------------------------------------------------------------
Thanks,
-Quentin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140325/c70156f3/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: exefix-broadcast.patch
Type: application/octet-stream
Size: 9578 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140325/c70156f3/attachment.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140325/c70156f3/attachment-0001.html>
More information about the llvm-commits
mailing list