[llvm-bugs] [Bug 38452] New: Potential missed macro-fusion optimization opportunity with cmp/jcc test/jcc

Sun Aug 5 20:19:27 PDT 2018

https://bugs.llvm.org/show_bug.cgi?id=38452

            Bug ID: 38452
           Summary: Potential missed macro-fusion optimization opportunity
                    with cmp/jcc test/jcc
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: All
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Backend: X86
          Assignee: unassignedbugs at nondot.org
          Reporter: codeman.consulting at gmail.com
                CC: llvm-bugs at lists.llvm.org

In bug 38450 the following was posted:

Gonzalo BG 2018-08-05 06:55:12 PDT

See https://godbolt.org/g/5W5q2K , the following LLVM-IR:

define void @foo(i32* noalias nocapture dereferenceable(4) %x) {
start:
  %0 = load i32, i32* %x, align 4
  %1 = icmp eq i32 %0, 0
  br i1 %1, label %bb2, label %bb1

bb1:
  store i32 0, i32* %x, align 4
  br label %bb2

bb2:
  ret void
}

does not optimize to anything better with opt -O3. Using llc, it generates the
following x86_64 assembly code (https://godbolt.org/g/RX81Sn): 

    cmpl    $0, (%rdi)
    je      .LBB0_2
    movl    $0, (%rdi)
.LBB0_2:  
    retq

Although the original bug was marked invalid because of side effects of the
proposed alternate, this code contains a potential missed macro-fusion
optimization:

In a more general case, would the following be faster because of the
macro-fusion of the cmp/je pair allowed by cmp mem, reg instead of cmp mem,
imm?  

[See Intel Optimization manual 2016 p. 3-13 "CMP and TEST can not be fused when
comparing MEM-IMM (e.g. CMP [EAX],0x80; JZ label). ]

Assembly/Compiler Coding Rule 19. states that additional instructions should
not be added to avoid a mem, imm comparison / test, but it should be avoided
when possible.  In this case the value can be used in the store as well, and
maybe re-used from an earlier zero constant.  We don't care about the flag
changes here.

    xor eax, eax
    cmp [rdi], eax
    je .LBB0_2
    mov [rdi], eax
.LBB0_2:
    ret

Code size is a bit smaller as well.  
Alternately: 

    mov eax, [rdi]
    test eax, eax
    je .LBB0_2
    mov DWORD PTR [rdi], 0
.LBB0_2
    ret

Should allow macro-fusion but I was unclear from the optimization manual
whether the load prior to test would introduce a stall (they do recommend it). 

[3.5.1.9]

Assembly/Compiler Coding Rule 40. (ML impact, M generality) Use the TEST
instruction instead of AND when the result of the logical AND is not used. This
saves μops in execution. Use a TEST of a register with itself instead of a CMP
 of the register to zero, this saves the need to encode the zero and saves
encoding space. Avoid comparing a constant to a memory operand. It is
preferable to load the memory operand and compare the constant to a register.  

Assembly/Compiler Coding Rule 41. (ML impact, M generality) Eliminate
unnecessary compare with zero instructions by using the appropriate conditional
jump instruction when the flags are already set by a preceding arithmetic
instruction. If necessary, use a TEST instruction instead of a compare. Be 
certain that any code transformations made do not introduce problems with
overflow.

40 and 41 seem to apply only to test and zero comparisons, whereas 19 is more
generalized and suggests not using extra instructions to do this.  Based on
that the second one seems closest to the guidelines, although the first seems
useful as a code size reduction / macro-fusion optimization when we have a
register already set to zero so aren't generating extra code for the sake of
this.  

Note that these only appear to be valid in 64 bit mode on Nehalem and up, but
apply to Core series 32 bit mode.  

Anyway, these should fold to single dispatch with reduced latency.   In the
example code this doesn't matter much, but in a loop it could help speeds a
bit. 

bug 38079 mentions a similar enhancement.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20180806/8d2a6df0/attachment.html>