[llvm-bugs] [Bug 27908] New: Incorrect generated code with AVX

Fri May 27 04:46:20 PDT 2016

https://llvm.org/bugs/show_bug.cgi?id=27908

            Bug ID: 27908
           Summary: Incorrect generated code with AVX
           Product: new-bugs
           Version: 3.8
          Hardware: PC
                OS: All
            Status: NEW
          Severity: normal
          Priority: P
         Component: new bugs
          Assignee: unassignedbugs at nondot.org
          Reporter: gael.guennebaud at gmail.com
                CC: llvm-bugs at lists.llvm.org
    Classification: Unclassified

The following piece code:

#include <iostream>
// http://bitbucket.org/eigen/eigen/get/default.tar.gz
#include <Eigen/Dense>
using namespace Eigen;
int main()
{
  Projective3d t4;
  Vector3d v3;
  do {
    v3 = Vector3d::Ones();
  } while (v3.cwiseAbs().minCoeff()<1e-16);
  t4.matrix().setIdentity();
  t4.matrix().col(3).head<3>() = v3;
  std::cout << t4.matrix() << "\n\n";
  t4.translate(v3);
//     t4.translationExt() += t4.linearExt() * v3;
}

compiled with clang 3.7 or 3.8 or 3.9 with "-mavx -O2" generates the following
output:

           1            0            0            1
           0            1            0 5.29981e-315
           0            0            1            1
           0            0            0            1

where the  "5.29981e-315" number should be "1". t4.matrix() is essentially a
4x4 column matrix stored as s static array of 16 doubles. The incorrect asm
part responsible for filling the last 4 entries is as follows:

    movq    %rax, 120(%rsp)
    movl    $1072693248, %ecx       ## imm = 0x3FF00000
    vmovq    %rcx, %xmm0
    vpslldq    $8, %xmm0, %xmm0        ## xmm0 =
zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0,1,2,3,4,5,6,7]
    vpshufd    $226, %xmm0, %xmm0      ## xmm0 = xmm0[2,0,2,3]
    vmovdqa    LCPI0_0(%rip), %xmm1    ## xmm1 = [0,1072693248,0,1072693248]
    vpunpcklqdq    %xmm0, %xmm1, %xmm0 ## xmm0 = xmm1[0],xmm0[0]
    vmovdqa    %xmm0, 96(%rsp)
    movq    %rax, 112(%rsp)

where %rax contains a representation of 1.0, and 96(%rsp) references the first
element of the last column.

Replacing the last line, that is "t4.translate(v3);" by the body of the
translate method (last commented line), hides the issue, and in this case we
get a much cleaner asm:

    movq    %rax, 120(%rsp)
    movq    %rax, 96(%rsp)
    vmovaps    LCPI0_2(%rip), %xmm0    ## xmm0 =
[4607182418800017408,4607182418800017408]
    vmovups    %xmm0, 104(%rsp)

I guess that the weird shifting and shuffling we see in the broken part comes
from the do-while condition which is only  partly optimized away.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20160527/9bfaf952/attachment-0001.html>