[llvm-bugs] [Bug 25999] New: spec2000/188.ammp, spec2006/433.milc, 444.namd, 447.dealII, 453.povray compilation fails on LTO stage after commit r256394

Sat Jan 2 04:37:40 PST 2016

https://llvm.org/bugs/show_bug.cgi?id=25999

            Bug ID: 25999
           Summary: spec2000/188.ammp, spec2006/433.milc, 444.namd,
                    447.dealII, 453.povray compilation fails on LTO stage
                    after commit r256394
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: Linux
            Status: NEW
          Keywords: miscompilation
          Severity: normal
          Priority: P
         Component: Backend: X86
          Assignee: unassignedbugs at nondot.org
          Reporter: sergey.k.okunev at gmail.com
                CC: david.l.kreitzer at intel.com, denis.briltz at intel.com,
                    elena.demikhovsky at intel.com, llvm-bugs at lists.llvm.org,
                    sergos.gnu at gmail.com, spatel+llvm at rotateright.com,
                    zia.ansari at intel.com
    Classification: Unclassified

Bisect analysis showed LLVM revision 256394 is responsible for the fails. The
comments to commit are the following.

commit 75759ab3e9255fe5f716e4a71ca1ee56901dedf8
Author: Sanjay Patel <spatel at rotateright.com>
Date:   Thu Dec 24 21:17:56 2015 +0000

    [InstCombine] transform more extract/insert pairs into shuffles (PR2109)

    This is an extension of the shuffle combining from r203229:
    http://reviews.llvm.org/rL203229

    The idea is to widen a short input vector with undef elements so the
    existing shuffle transform for extract/insert can kick in.

    The motivation is to finally solve PR2109:
    https://llvm.org/bugs/show_bug.cgi?id=2109

    For that example, the IR becomes:

    %1 = bitcast <2 x i32>* %P to <2 x float>*
    %ld1 = load <2 x float>, <2 x float>* %1, align 8
    %2 = shufflevector <2 x float> %ld1, <2 x float> undef, <4 x i32> <i32 0,
i32 1, i32 undef, i32 undef>
    %i2 = shufflevector <4 x float> %A, <4 x float> %2, <4 x i32> <i32 0, i32
1, i32 4, i32 5>
    ret <4 x float> %i2

    And x86 SSE output improves from:

    movq        (%rdi), %xmm1           ## xmm1 = mem[0],zero
    movdqa      %xmm1, %xmm2
    shufps      $229, %xmm2, %xmm2      ## xmm2 = xmm2[1,1,2,3]
    shufps      $48, %xmm0, %xmm1       ## xmm1 = xmm1[0,0],xmm0[3,0]
    shufps      $132, %xmm1, %xmm0      ## xmm0 = xmm0[0,1],xmm1[0,2]
    shufps      $32, %xmm0, %xmm2       ## xmm2 = xmm2[0,0],xmm0[2,0]
    shufps      $36, %xmm2, %xmm0       ## xmm0 = xmm0[0,1],xmm2[2,0]
    retq

    To the almost optimal:

    movhpd      (%rdi), %xmm0

    Note: There's a tension in the existing transform related to generating
    arbitrary shufflevector masks. We avoid that in other places in InstCombine
    because we're scared that codegen can't handle strange masks, but it looks
    like we're ok with producing those here. I purposely chose weird
insert/extract
    indexes for the regression tests to see the effect in these cases.
    For PowerPC+Altivec, AArch64, and X86+SSE/AVX, I think the codegen is equal
or
    better for these examples.

    Differential Revision: http://reviews.llvm.org/D15096

    git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@256394
91177308-0d34-0410-b5e6-96231b3b80d8

LLVM-clang options: -m64 -fuse-ld=gold -Ofast -funroll-loops -flto -static
-mfpmath=sse -march=core-avx2

During LTO phase spec benchmarks fail with the following error message (e.g.,
spec2006/444.namd).

runspec --config=lnx-x86_64-clang-default.cfg --rebuild -a build -e ref64 -T
base 444
…………………………………………

clang++ -m64  -m64  -fuse-ld=gold  -Ofast -funroll-loops -flto -static 
-mfpmath=sse -march=core-avx2   -DSPEC_CPU_LP64        Compute.o ComputeList.o
ComputeNonbondedUtil.o LJTable.o Molecule.o Patch.o PatchList.o ResultSet.o
SimParameters.o erf.o spec_namd.o                     -o namd
Instruction does not dominate all uses!
  %782 = extractelement <2 x double> %721, i32 1
  %779 = insertelement <4 x double> undef, double %782, i32 0
Instruction does not dominate all uses!
  %1053 = extractelement <2 x double> %974, i32 1
  %1050 = insertelement <4 x double> undef, double %1053, i32 0
Instruction does not dominate all uses!
  %1332 = shufflevector <2 x double> %1263, <2 x double> undef, <4 x i32> <i32
0, i32 1, i32 undef, i32 undef>
  %1330 = shufflevector <4 x double> %1329, <4 x double> %1332, <4 x i32> <i32
0, i32 5, i32 undef, i32 undef>
LLVM ERROR: Broken function found, compilation aborted!
clang-3.8: error: linker command failed with exit code 1 (use -v to see
invocation)
specmake: *** [namd] Error 1

Okunev Sergey,
Software Engineer
Intel Compiler Team

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20160102/99a0c073/attachment-0001.html>