<html>

    <head>

      <base href="https://bugs.llvm.org/">

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW - Clang/LLVM optimizes division and modulo worse than MSVC, part 2"

   href="https://bugs.llvm.org/show_bug.cgi?id=38217">38217</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>Clang/LLVM optimizes division and modulo worse than MSVC, part 2

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>libraries

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>trunk

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>Windows NT

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>normal

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>Scalar Optimizations

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>sfinae@hotmail.com

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>llvm-bugs@lists.llvm.org

          </td>

        </tr></table>

      <p>

        <div>

        <pre>Created <span class=""><a href="attachment.cgi?id=20570" name="attach_20570" title="Test case">attachment 20570</a> <a href="attachment.cgi?id=20570&action=edit" title="Test case">[details]</a></span>

Test case

This appears to be a different bug than

<a class="bz_bug_link 

          bz_status_NEW "

   title="NEW - Clang/LLVM optimizes division and modulo worse than MSVC"

   href="show_bug.cgi?id=37983">https://bugs.llvm.org/show_bug.cgi?id=37983</a> "Clang/LLVM optimizes division and

modulo worse than MSVC" (which is probably a duplicate of

<a class="bz_bug_link 

          bz_status_NEW "

   title="NEW - Division followed by modulo generates longer machine code than vice versa"

   href="show_bug.cgi?id=23106">https://bugs.llvm.org/show_bug.cgi?id=23106</a> "Division followed by modulo

generates longer machine code than vice versa") because it involves modulo

followed by division.

This affects the Ryu algorithm for printing floating-point numbers

(<a href="https://github.com/ulfjack/ryu">https://github.com/ulfjack/ryu</a> ) and therefore affects C++17 floating-point

std::to_chars().

I observe that MSVC's codegen is unaffected by WORKAROUND, while Clang/LLVM

generates less assembly code (which is faster when profiled in the real

algorithm) for WORKAROUND.

Here's a Godbolt link demonstrating the codegen difference (this isn't

Windows-specific): <a href="https://godbolt.org/g/uX1AD8">https://godbolt.org/g/uX1AD8</a>

C:\Temp\TESTING_X64>cl

Microsoft (R) C/C++ Optimizing Compiler Version 19.16.26504 for x64

Copyright (C) Microsoft Corporation.  All rights reserved.

usage: cl [ option... ] filename... [ /link linkoption... ]

C:\Temp\TESTING_X64>clang-cl -m64 -v

clang version 6.0.0 (tags/RELEASE_600/final)

Target: x86_64-pc-windows-msvc

Thread model: posix

InstalledDir: S:\msvc\src\vctools\NonShip\ClangLLVM\bin

C:\Temp\TESTING_X64>type d2s.cpp

#include <stdint.h>

#include <string.h>

static const char DIGIT_TABLE[] =

"0001020304050607080910111213141516171819"

"2021222324252627282930313233343536373839"

"4041424344454647484950515253545556575859"

"6061626364656667686970717273747576777879"

"8081828384858687888990919293949596979899";

void d2s_buffered(uint64_t output, char * result) {

    uint32_t i = 0;

    while (output >= 10000) {

#ifdef WORKAROUND

        const uint32_t c = (uint32_t) (output - 10000 * (output / 10000));

#else

        const uint32_t c = (uint32_t) (output % 10000);

#endif

        output /= 10000;

        const uint32_t c0 = (c % 100) << 1;

        const uint32_t c1 = (c / 100) << 1;

        memcpy(result - i - 1, DIGIT_TABLE + c0, 2);

        memcpy(result - i - 3, DIGIT_TABLE + c1, 2);

        i += 4;

    }

}

C:\Temp\TESTING_X64>cl /EHsc /nologo /W4 /MT /O2 /c d2s.cpp /FAsc

/Famsvc_workaround.cod /DWORKAROUND

d2s.cpp

C:\Temp\TESTING_X64>cl /EHsc /nologo /W4 /MT /O2 /c d2s.cpp /FAsc

/Famsvc_modulo.cod

d2s.cpp

C:\Temp\TESTING_X64>git diff msvc_workaround.cod msvc_modulo.cod

diff --git a/msvc_workaround.cod b/msvc_modulo.cod

index 1be1419..aff234c 100644

--- a/msvc_workaround.cod

+++ b/msvc_modulo.cod

@@ -86,11 +86,11 @@ $LL2@d2s_buffer:

 ; 15   : #ifdef WORKAROUND

 ; 16   :         const uint32_t c = (uint32_t) (output - 10000 * (output /

10000));

+; 17   : #else^M

+; 18   :         const uint32_t c = (uint32_t) (output % 10000);^M

   00030        48 8b c7         mov     rax, rdi

-; 17   : #else

-; 18   :         const uint32_t c = (uint32_t) (output % 10000);

 ; 19   : #endif

 ; 20   :

 ; 21   :         output /= 10000;

C:\Temp\TESTING_X64>clang-cl -m64 /EHsc /nologo /W4 /MT /O2 /c d2s.cpp /FA

/Faclang_workaround.asm /DWORKAROUND

C:\Temp\TESTING_X64>clang-cl -m64 /EHsc /nologo /W4 /MT /O2 /c d2s.cpp /FA

/Faclang_modulo.asm

C:\Temp\TESTING_X64>git diff clang_workaround.asm clang_modulo.asm

diff --git a/clang_workaround.asm b/clang_modulo.asm

index 2a8cb49..6026638 100644

--- a/clang_workaround.asm

+++ b/clang_modulo.asm

@@ -29,19 +29,22 @@

        movq    %r9, %rax

        mulq    %r10

        shrq    $11, %rdx

-       imulq   $-10000, %rdx, %rax     # imm = 0xD8F0

-       addq    %r9, %rax

-       movl    %eax, %esi

-       imulq   $1374389535, %rsi, %rsi # imm = 0x51EB851F

-       shrq    $37, %rsi

-       imull   $100, %esi, %edi

-       subl    %edi, %eax

+       imulq   $10000, %rdx, %rax      # imm = 0x2710

+       movq    %r9, %rsi

+       subq    %rax, %rsi

+       imulq   $1374389535, %rsi, %rax # imm = 0x51EB851F

+       movq    %rax, %rdi

+       shrq    $37, %rdi

+       imull   $100, %edi, %edi

+       subl    %edi, %esi

+       shrq    $36, %rax

+       andl    $510, %eax              # imm = 0x1FE

        movl    %ecx, %edi

        movq    %r8, %rbx

        subq    %rdi, %rbx

-       movzwl  (%r11,%rax,2), %eax

-       movw    %ax, -1(%rbx)

-       movzwl  (%r11,%rsi,2), %eax

+       movzwl  (%r11,%rsi,2), %esi

+       movw    %si, -1(%rbx)

+       movzwl  (%rax,%r11), %eax

        movw    %ax, -3(%rbx)

        addl    $4, %ecx

        cmpq    $99999999, %r9          # imm = 0x5F5E0FF</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>