[llvm-bugs] [Bug 26515] New: Extra vector move after a poor choice of destination register for a commutative operation
via llvm-bugs
llvm-bugs at lists.llvm.org
Sun Feb 7 05:00:33 PST 2016
https://llvm.org/bugs/show_bug.cgi?id=26515
Bug ID: 26515
Summary: Extra vector move after a poor choice of destination
register for a commutative operation
Product: new-bugs
Version: 3.7
Hardware: PC
OS: Linux
Status: NEW
Keywords: performance
Severity: normal
Priority: P
Component: new bugs
Assignee: unassignedbugs at nondot.org
Reporter: peter at cordes.ca
CC: llvm-bugs at lists.llvm.org
Classification: Unclassified
Possible duplicate of https://llvm.org/bugs/show_bug.cgi?id=15705. It shows
the same thing in integer code with cmov.
float add_in_wrong_order(__m128 v) {
__m128 shuf = _mm_movehl_ps(v, v);
__m128 sums = _mm_add_ps(shuf, v); // Using _ss doesn't help
return _mm_cvtss_f32(sums);
}
with clang 3.7.1 -xc -O3 -Wall -fverbose-asm -march=haswell -ffast-math
-mno-avx
compiles to (godbolt: http://goo.gl/sV9tMR)
# (see the end of this post for a suggested optimal sequence)
movaps xmm1, xmm0
movhlps xmm1, xmm1 # with add_ss, this is shufpd x1,x1, 1
addps xmm1, xmm0
movaps xmm0, xmm1 # should have just used addps xmm0, xmm1
I've seen this failure to generate the result in the desired register before in
clang output, but didn't get around to reporting it. I forget if I've seen
this with integer registers, or just with non-AVX vectors.
---
Would it be possible for LLVM to see that _mm_cvtss_f32 is only taking the low
element, and propagate that back to the inputs to add_ps? Probably not, since
you still need to potentially fault on a signalling NaN in elements 1..3 if the
MXCSR has that enabled. Still, doesn't -ffast-math mean you *don't* have to
care what happens to NaNs?
--------
As long as you don't cause slowdowns from doing FP calcs on garbage data,
-ffast-math lets you do a lot. I think this sequence would be valid for that
source, with -ffast-math:
movhlps xmm1, xmm0 # upper half is garbage
addss xmm0, xmm1 # so don't use it
There's a false dependency on the old contents of xmm1. When inlined, you can
use any long-dead register, or better: one that had to be ready at some point
before v was ready.
gcc likes to xor-zero registers to avoid false dependencies (e.g. for popcnt if
it doesn't want to overwrite the src).
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20160207/d38330e6/attachment.html>
More information about the llvm-bugs
mailing list