[llvm-bugs] [Bug 25899] New: Loads and Stores are not always coalesced

Sun Dec 20 01:21:28 PST 2015

https://llvm.org/bugs/show_bug.cgi?id=25899

            Bug ID: 25899
           Summary: Loads and Stores are not always coalesced
           Product: libraries
           Version: 3.7
          Hardware: PC
                OS: All
            Status: NEW
          Severity: normal
          Priority: P
         Component: Backend: X86
          Assignee: unassignedbugs at nondot.org
          Reporter: haneef503 at gmail.com
                CC: llvm-bugs at lists.llvm.org
    Classification: Unclassified

Clang (llvm?) sometimes generates inefficient code for loads and stores, but
recognizes that *the same code* can be optimized into fewer loads/stores at
different times. For example, take this simple code:

```
#include <stdint.h>

int l32 (const uint8_t *b) {
    int r = 0;
    r ^= b[0];
    r ^= b[1] << 8;
    r ^= b[2] << 16;
    r ^= b[3] << 24;

    return r;
}

int f (int a) {
    return l32 ((void *) &a);
}
```

`clang -O2` generates (clang 3.7, intel syntax, extraneous contents removed):

l32:
    movzx    eax, byte ptr [rdi]
    movzx    ecx, byte ptr [rdi + 1]
    shl    ecx, 8
    or    ecx, eax
    movzx    edx, byte ptr [rdi + 2]
    shl    edx, 16
    or    edx, ecx
    movzx    eax, byte ptr [rdi + 3]
    shl    eax, 24
    or    eax, edx
    ret

f:
    mov    eax, edi
    ret

If it was able to optimize f() to a simple register move, it must have
recognized that the loads could be coalesced into a single load (or that a
little endian load was being compiled for an architecture that happened to be
little endian). Hence, it really is quite odd that it didn't perform the same
optimization and reduce l32() to something more like:

l32:
    mov    eax, dword ptr [rdi]
    ret

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20151220/1db5e0c3/attachment-0001.html>