[llvm-bugs] [Bug 38452] New: Potential missed macro-fusion optimization opportunity with cmp/jcc test/jcc
via llvm-bugs
llvm-bugs at lists.llvm.org
Sun Aug 5 20:19:27 PDT 2018
https://bugs.llvm.org/show_bug.cgi?id=38452
Bug ID: 38452
Summary: Potential missed macro-fusion optimization opportunity
with cmp/jcc test/jcc
Product: libraries
Version: trunk
Hardware: PC
OS: All
Status: NEW
Severity: enhancement
Priority: P
Component: Backend: X86
Assignee: unassignedbugs at nondot.org
Reporter: codeman.consulting at gmail.com
CC: llvm-bugs at lists.llvm.org
In bug 38450 the following was posted:
Gonzalo BG 2018-08-05 06:55:12 PDT
See https://godbolt.org/g/5W5q2K , the following LLVM-IR:
define void @foo(i32* noalias nocapture dereferenceable(4) %x) {
start:
%0 = load i32, i32* %x, align 4
%1 = icmp eq i32 %0, 0
br i1 %1, label %bb2, label %bb1
bb1:
store i32 0, i32* %x, align 4
br label %bb2
bb2:
ret void
}
does not optimize to anything better with opt -O3. Using llc, it generates the
following x86_64 assembly code (https://godbolt.org/g/RX81Sn):
cmpl $0, (%rdi)
je .LBB0_2
movl $0, (%rdi)
.LBB0_2:
retq
Although the original bug was marked invalid because of side effects of the
proposed alternate, this code contains a potential missed macro-fusion
optimization:
In a more general case, would the following be faster because of the
macro-fusion of the cmp/je pair allowed by cmp mem, reg instead of cmp mem,
imm?
[See Intel Optimization manual 2016 p. 3-13 "CMP and TEST can not be fused when
comparing MEM-IMM (e.g. CMP [EAX],0x80; JZ label). ]
Assembly/Compiler Coding Rule 19. states that additional instructions should
not be added to avoid a mem, imm comparison / test, but it should be avoided
when possible. In this case the value can be used in the store as well, and
maybe re-used from an earlier zero constant. We don't care about the flag
changes here.
xor eax, eax
cmp [rdi], eax
je .LBB0_2
mov [rdi], eax
.LBB0_2:
ret
Code size is a bit smaller as well.
Alternately:
mov eax, [rdi]
test eax, eax
je .LBB0_2
mov DWORD PTR [rdi], 0
.LBB0_2
ret
Should allow macro-fusion but I was unclear from the optimization manual
whether the load prior to test would introduce a stall (they do recommend it).
[3.5.1.9]
Assembly/Compiler Coding Rule 40. (ML impact, M generality) Use the TEST
instruction instead of AND when the result of the logical AND is not used. This
saves μops in execution. Use a TEST of a register with itself instead of a CMP
of the register to zero, this saves the need to encode the zero and saves
encoding space. Avoid comparing a constant to a memory operand. It is
preferable to load the memory operand and compare the constant to a register.
Assembly/Compiler Coding Rule 41. (ML impact, M generality) Eliminate
unnecessary compare with zero instructions by using the appropriate conditional
jump instruction when the flags are already set by a preceding arithmetic
instruction. If necessary, use a TEST instruction instead of a compare. Be
certain that any code transformations made do not introduce problems with
overflow.
40 and 41 seem to apply only to test and zero comparisons, whereas 19 is more
generalized and suggests not using extra instructions to do this. Based on
that the second one seems closest to the guidelines, although the first seems
useful as a code size reduction / macro-fusion optimization when we have a
register already set to zero so aren't generating extra code for the sake of
this.
Note that these only appear to be valid in 64 bit mode on Nehalem and up, but
apply to Core series 32 bit mode.
Anyway, these should fold to single dispatch with reduced latency. In the
example code this doesn't matter much, but in a loop it could help speeds a
bit.
bug 38079 mentions a similar enhancement.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20180806/8d2a6df0/attachment.html>
More information about the llvm-bugs
mailing list