[llvm-bugs] [Bug 34707] New: unnecessary 8-bit partial-register usage creates false dependencies.
via llvm-bugs
llvm-bugs at lists.llvm.org
Fri Sep 22 11:22:20 PDT 2017
https://bugs.llvm.org/show_bug.cgi?id=34707
Bug ID: 34707
Summary: unnecessary 8-bit partial-register usage creates false
dependencies.
Product: libraries
Version: trunk
Hardware: PC
OS: Linux
Status: NEW
Severity: enhancement
Priority: P
Component: Backend: X86
Assignee: unassignedbugs at nondot.org
Reporter: peter at cordes.ca
CC: llvm-bugs at lists.llvm.org
unsigned long bzhi_l(unsigned long x, unsigned c) {
return x & ((1UL << c) - 1);
}
// https://godbolt.org/g/sBEyfd
clang 6.0.0 (trunk 313965) -xc -O3 -march=haswell -m32 or znver1
movb 8(%esp), %al
bzhil %eax, 4(%esp), %eax
retl
This is technically correct (because BZHI only looks at the low 8 bits of
src2), but horrible. There is *no* advantage to using an 8-bit load here
instead of a 32-bit load. Same code size, but creates a false dependency on
the old value of rax.
(znver1 definitely doesn't rename partial registers. Intel Haswell/Skylake
don't rename low8 registers separately from the full register, unlike
Sandybridge or Core2/Nehalem.
https://stackoverflow.com/questions/45660139/how-exactly-do-partial-registers-on-haswell-skylake-perform-writing-al-seems-to).
On Haswell and Skylake, movb 8(%esp), %al runs at 1 per cycle, as a
micro-fused ALU+load uop. An occasional dep-breaking xor %eax,%eax lets it
bottleneck on 2 loads per clock.
Clang seems to be very eager to only move 8 bits instead of the full register.
Clang 3.9 fixed this for reg-reg moves (e.g. unsigned shift(unsigned x,
unsigned c) { return x<<c; } without BMI2), but we're still getting 8-bit
loads. On Intel CPUs, MOVZX loads are cheaper than narrow MOV loads because
they avoid the ALU uop to merge into the destination. (It does take an extra
code byte). AMD CPUs may use an ALU port for MOVZX, but Intel handles it
purely in the load ports.
But anyway, when loading from 32-bit memory location, it makes no sense to load
only the low 8 bits, unless we have reason to expect it was written with
separate byte stores and we want to avoid a store-forwarding stall.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20170922/b3f46fec/attachment-0001.html>
More information about the llvm-bugs
mailing list