[llvm-bugs] [Bug 26859] [x86, SSE] only use phaddw / phaddd when optimizing for minsize?
via llvm-bugs
llvm-bugs at lists.llvm.org
Fri Oct 12 09:56:40 PDT 2018
https://bugs.llvm.org/show_bug.cgi?id=26859
Sanjay Patel <spatel+llvm at rotateright.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|NEW |RESOLVED
--- Comment #14 from Sanjay Patel <spatel+llvm at rotateright.com> ---
(In reply to Simon Pilgrim from comment #13)
> https://reviews.llvm.org/D53095
Committed here:
https://reviews.llvm.org/rL344361
There's a stunning amount of vector duplication + unrolling in these tests
currently:
$ clang -O2 accum.c -S -o - -mavx | grep padd | wc -l
155
...but that's not this bug.
This is the current behavior:
$ clang -Os accum.c -S -o - -mavx | grep phadd | grep mm
vphaddd %xmm0, %xmm0, %xmm0
vphaddw %xmm0, %xmm0, %xmm0
$ clang -O2 accum.c -S -o - -mavx | grep phadd | grep mm
$ clang -O2 accum.c -S -o - -mavx -march=btver2 | grep phadd | grep mm
vphaddd %xmm0, %xmm0, %xmm0
vphaddw %xmm0, %xmm0, %xmm0
Ie, if we are optimizing for size or Jaguar, we'll use horizontal ops,
otherwise, we use regular ops and shuffles.
It's possible that our combiner predicate will need adjustments to optimize
that decision depending on code pattern and uarch, but we now have that
ability. Ideally, we can refine the choice using CPU instruction
latency/throughput models rather than with the DAG heuristic in the patch, but
that's also another bug.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20181012/9b1eb487/attachment-0001.html>
More information about the llvm-bugs
mailing list