[llvm-bugs] [Bug 36723] New: Invalid code generation from SIMD intrinsics when function is inlined

via llvm-bugs llvm-bugs at lists.llvm.org
Wed Mar 14 06:08:50 PDT 2018


https://bugs.llvm.org/show_bug.cgi?id=36723

            Bug ID: 36723
           Summary: Invalid code generation from SIMD intrinsics when
                    function is inlined
           Product: clang
           Version: 4.0
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: -New Bugs
          Assignee: unassignedclangbugs at nondot.org
          Reporter: sylvain.laperche at scality.com
                CC: llvm-bugs at lists.llvm.org

Created attachment 20061
  --> https://bugs.llvm.org/attachment.cgi?id=20061&action=edit
testcase

Hi,

The attached code is not compiled correctly by Clang 4.x when the code is
inlined.
To reproduce the issue, the code code can be compiled with:

    clang++ -std=c++11 -O2 -msse4.1 -o testcase testcase.cpp

Then, it can be ran with

    ./testcase 25532 0

I had to pass the values on the command-line, otherwise everything is computed
at compile-time :D, and with no optimisation enabled the bug doesn't occurs.

The code seems "undefined behavior"-free, but I may have missed something.
Valgrind and `-fsanitize=undefined` don't complains.

The function sub_test is correctly compiled (checked with objdump):

    0000000000400990 <_Z8sub_testooj>:
      400990:       66 41 0f 6e c0          movd   %r8d,%xmm0
      400995:       66 0f 70 c0 00          pshufd $0x0,%xmm0,%xmm0
      40099a:       66 48 0f 6e ce          movq   %rsi,%xmm1
      40099f:       66 48 0f 6e d7          movq   %rdi,%xmm2
      4009a4:       66 0f 6c d1             punpcklqdq %xmm1,%xmm2
      4009a8:       66 48 0f 6e c9          movq   %rcx,%xmm1
      4009ad:       66 48 0f 6e da          movq   %rdx,%xmm3
      4009b2:       66 0f 6c d9             punpcklqdq %xmm1,%xmm3
      4009b6:       66 0f 6f cb             movdqa %xmm3,%xmm1
      4009ba:       66 0f 66 ca             pcmpgtd %xmm2,%xmm1
      4009be:       66 0f db c8             pand   %xmm0,%xmm1
      4009c2:       66 0f fa d3             psubd  %xmm3,%xmm2
      4009c6:       66 0f fe d1             paddd  %xmm1,%xmm2
      4009ca:       66 48 0f 7e d0          movq   %xmm2,%rax
      4009cf:       66 48 0f 3a 16 d2 01    pextrq $0x1,%xmm2,%rdx
      4009d6:       c3                      retq
      4009d7:       66 0f 1f 84 00 00 00    nopw   0x0(%rax,%rax,1)
      4009de:       00 00

*BUT* when the function is inlined (in main), the code becomes buggy:

    […]
    400af3:       66 48 0f 6e c0          movq   %rax,%xmm0
    400af8:       66 48 0f 6e ca          movq   %rdx,%xmm1
    400afd:       66 0f 6c c8             punpcklqdq %xmm0,%xmm1
    400b01:       66 48 0f 6e c1          movq   %rcx,%xmm0
    400b06:       66 48 0f 6e d6          movq   %rsi,%xmm2
    400b0b:       66 0f 6c d0             punpcklqdq %xmm0,%xmm2
    400b0f:       66 0f 6f c2             movdqa %xmm2,%xmm0
    400b13:       66 0f 66 c1             pcmpgtd %xmm1,%xmm0
    400b17:       66 0f 72 d0 1f          psrld  $0x1f,%xmm0
    400b1c:       66 0f fa ca             psubd  %xmm2,%xmm1
    400b20:       66 0f fe c8             paddd  %xmm0,%xmm1
    400b24:       66 48 0f 7e c8          movq   %xmm1,%rax
    400b29:       66 48 0f 3a 16 c9 01    pextrq $0x1,%xmm1,%rcx
    […]

The `pand` (corresponding to `_mm_and_si128`) is replaced by a `psrld` (just
after the `pcmpgtd`)

The bug was first encountered on MacOS with the following version of Clang

    % g++ --version
    Configured with: --prefix=/Library/Developer/CommandLineTools/usr
--with-gxx-include-dir=/usr/include/c++/4.2.1
    Apple LLVM version 9.0.0 (clang-900.0.39.2)
    Target: x86_64-apple-darwin16.7.0
    Thread model: posix
    InstalledDir: /Library/Developer/CommandLineTools/usr/bin

Which correspond to Clang 4.0.3 according to
https://en.wikipedia.org/wiki/Xcode#Latest_versions

I was able to reproduce it on Ubuntu 14.04, using the Clang 4.0 from "deb
http://apt.llvm.org/trusty/ llvm-toolchain-trusty-4.0 main"

    % clang++ --version
    clang version 4.0.1-svn305264-1~exp1 (branches/release_40)
    Target: x86_64-unknown-linux-gnu
    Thread model: posix
    InstalledDir: /usr/bin

Note that the generated code is correct with Clang 3.4 (tested on Ubuntu 14.04)

    % clang++ --version
    Ubuntu clang version 3.4-1ubuntu3 (tags/RELEASE_34/final) (based on LLVM
3.4)
    Target: x86_64-pc-linux-gnu
    Thread model: posix

No problem as well with Clang 5.0 (tested on Archlinux)

    % clang++ --version
    clang version 5.0.1 (tags/RELEASE_501/final)
    Target: x86_64-unknown-linux-gnu
    Thread model: posix
    InstalledDir: /usr/bin

So this seems specific to the 4.x branch.
I don't know the policy (do bugs get fixed in the 4.x branch or is "upgrade
your compiler" the official fix?) so I report it just in case.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20180314/496b92ac/attachment.html>


More information about the llvm-bugs mailing list