[llvm-bugs] [Bug 38103] New: Bad for loop copy optimization when using -fno-builtin-memcpy -fno-builtin-memmove

via llvm-bugs llvm-bugs at lists.llvm.org
Mon Jul 9 06:44:48 PDT 2018


https://bugs.llvm.org/show_bug.cgi?id=38103

            Bug ID: 38103
           Summary: Bad for loop copy optimization when using
                    -fno-builtin-memcpy -fno-builtin-memmove
           Product: clang
           Version: 6.0
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: C++
          Assignee: unassignedclangbugs at nondot.org
          Reporter: gchatelet at google.com
                CC: dgregor at apple.com, llvm-bugs at lists.llvm.org

Straightforward copy using a for loop with known size at compile time leads to
very poor assembly.

> #include <cstddef>
> 
> template <size_t kBlockSize>
> void Copy(char* __restrict dst, const char* __restrict src) {
>   for (size_t i = 0; i < kBlockSize; ++i) dst[i] = src[i];
> }
> 
> template void Copy<15>(char* __restrict dst, const char* __restrict src);

https://godbolt.org/g/YFq3o6

This can be mitigated by the introduction of a temporary buffer like so:
> template <size_t kBlockSize>
> void Copy(char* __restrict dst, const char* __restrict src) {
>   char tmp[kBlockSize];
>   for (size_t i = 0; i < kBlockSize; ++i) tmp[i] = src[i];
>   for (size_t i = 0; i < kBlockSize; ++i) dst[i] = tmp[i];
> }

https://godbolt.org/g/48Dghk

It works up to 25B and produce bad code from 26B onwards.

Check the resulting code for 32B for instance: https://godbolt.org/g/jZwcrv

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20180709/21b87799/attachment.html>


More information about the llvm-bugs mailing list