[llvm-bugs] [Bug 50320] New: Clang could merge multiple variable copies into one

via llvm-bugs llvm-bugs at lists.llvm.org
Wed May 12 10:04:40 PDT 2021


https://bugs.llvm.org/show_bug.cgi?id=50320

            Bug ID: 50320
           Summary: Clang could merge multiple variable copies into one
           Product: new-bugs
           Version: 12.0
          Hardware: PC
                OS: All
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: new bugs
          Assignee: unassignedbugs at nondot.org
          Reporter: stpasha at gmail.com
                CC: htmldeveloper at gmail.com, llvm-bugs at lists.llvm.org

In the following example code
```
struct X {
    bool a, b, c, d;
    int e;
    X(const X& x) : a(x.a), b(x.b), c(x.c), d(x.d), e(x.e) {}
};

```
the copy-constructor copies multiple primitive variables from one location in
memory to another. In this case, instead of issuing multiple copy commands, it
will be more efficient to copy all of them at once as a single 64-bit word.
However, this is the assembly produced by Clang (with `-O3` flag):
```
X::X(X const&) [base object constructor]:
        mov     al, byte ptr [rsi]
        mov     byte ptr [rdi], al
        mov     al, byte ptr [rsi + 1]
        mov     byte ptr [rdi + 1], al
        mov     al, byte ptr [rsi + 2]
        mov     byte ptr [rdi + 2], al
        mov     al, byte ptr [rsi + 3]
        mov     byte ptr [rdi + 3], al
        mov     eax, dword ptr [rsi + 4]
        mov     dword ptr [rdi + 4], eax
        ret
```

For comparison, this is the assembly produced by GCC for the same code:
```
X::X(X const&) [base object constructor]:
        mov     rax, QWORD PTR [rsi]
        mov     QWORD PTR [rdi], rax
        ret
```

The code can be viewed here: https://godbolt.org/z/cs15vr3sn

Interestingly, if the struct is not tightly packed (for example, one of the
booleans flags is missing), then both Clang and GCC produce sub-optimal code
involving multiple copies -- even though the optimizer could realize that
copying one byte of "ghost" space could improve the performance. In fact, it
doesn't even have to be "ghost": if the copy constructor skips copying or
otherwise initializing one of the boolean flags, then the optimizer could still
recognize that adding that extra copy will be beneficial both from code size
and speed perspectives.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20210512/9abf046b/attachment.html>


More information about the llvm-bugs mailing list