[llvm-bugs] [Bug 51263] New: Possible missed optimization: relaxed stores to narrow atomic types are not coalesced

Wed Jul 28 22:00:00 PDT 2021

https://bugs.llvm.org/show_bug.cgi?id=51263

            Bug ID: 51263
           Summary: Possible missed optimization: relaxed stores to narrow
                    atomic types are not coalesced
           Product: new-bugs
           Version: trunk
          Hardware: PC
                OS: All
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: new bugs
          Assignee: unassignedbugs at nondot.org
          Reporter: eric.mueller1024 at gmail.com
                CC: htmldeveloper at gmail.com, llvm-bugs at lists.llvm.org

The following code (https://godbolt.org/z/fcqexoYMe):

#include <cstddef>
#include <atomic>

struct normal_storage
{
    uint8_t a[8];
};

struct atomic_storage
{
    std::atomic<uint8_t> a[8];
};

void f(normal_storage * ns, atomic_storage * as)
{
    for (size_t i = 0; i < 8; ++i) {
        as->a[i].store(ns->a[i], std::memory_order_relaxed);
    }
}

generates 8 single-byte loads and stores when compiled for x86_64 with -O3 with
clang trunk.

f(normal_storage*, atomic_storage*): # @f(normal_storage*, atomic_storage*)
        mov     al, byte ptr [rdi]
        mov     byte ptr [rsi], al
        mov     al, byte ptr [rdi + 1]
        mov     byte ptr [rsi + 1], al
        mov     al, byte ptr [rdi + 2]
        mov     byte ptr [rsi + 2], al
        mov     al, byte ptr [rdi + 3]
        mov     byte ptr [rsi + 3], al
        mov     al, byte ptr [rdi + 4]
        mov     byte ptr [rsi + 4], al
        mov     al, byte ptr [rdi + 5]
        mov     byte ptr [rsi + 5], al
        mov     al, byte ptr [rdi + 6]
        mov     byte ptr [rsi + 6], al
        mov     al, byte ptr [rdi + 7]
        mov     byte ptr [rsi + 7], al
        ret

Given that relaxed atomics provide no guarantees about ordering, it seems to me
like it may be legal to coalesce this into a single 8-byte store instead.

Is this a missed optimization? Or is there some specific language in the
standard that makes such an optimization illegal?

For what it's worth, from what I can tell it looks like none of GCC, ICC, nor
MSVC make this optimization.

Clearly the above code could be changed to use a single 8-byte atomic instead,
but that's somewhat besides the point. I can provide more context on why this
is useful if it helps.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20210729/33e55b58/attachment.html>