[llvm-bugs] [Bug 28845] New: Incorrect codegen for "store <2 x i48>" triggered by -fslp-vectorize-aggressive

Thu Aug 4 08:03:13 PDT 2016

https://llvm.org/bugs/show_bug.cgi?id=28845

            Bug ID: 28845
           Summary: Incorrect codegen for "store <2 x i48>" triggered by
                    -fslp-vectorize-aggressive
           Product: new-bugs
           Version: trunk
          Hardware: PC
                OS: All
            Status: NEW
          Severity: normal
          Priority: P
         Component: new bugs
          Assignee: unassignedbugs at nondot.org
          Reporter: babokin at gmail.com
                CC: elena.demikhovsky at intel.com, llvm-bugs at lists.llvm.org,
                    Vsevolod.Livinskij at frtk.ru
    Classification: Unclassified

Created attachment 16882
  --> https://llvm.org/bugs/attachment.cgi?id=16882&action=edit
reproducer

Attached test case has a structure with a number of bit fields (discard the
weirdness of C defintion of the structure, it's not important, while LLVM IR
defintion is important).
LLVM IR structure defintion:
%struct.struct_1 = type { [6 x i8], [6 x i8], i24 }

Initialization happens by read-modity-write of two 48 bit chunks, no magic
here.
define void @_Z4initv() local_unnamed_addr #0 {
entry:
  %bf.load = load i48, i48* bitcast (%struct.struct_1* @s1 to i48*), align 8
  %bf.clear = and i48 %bf.load, -8796091973633
  %bf.set3 = or i48 %bf.clear, 7326889148416
  store i48 %bf.set3, i48* bitcast (%struct.struct_1* @s1 to i48*), align 8
  %bf.load4 = load i48, i48* bitcast ([6 x i8]* getelementptr inbounds
(%struct.struct_1, %struct.struct_1* @s1, i64 0, i32 1) to i48*), align 2
  %bf.clear5 = and i48 %bf.load4, -2198956146689
  %bf.set6 = or i48 %bf.clear5, 822419128320
  store i48 %bf.set6, i48* bitcast ([6 x i8]* getelementptr inbounds
(%struct.struct_1, %struct.struct_1* @s1, i64 0, i32 1) to i48*), align 2
  ret void
}

But when test case is compiler with -fslp-vectorize-aggressive, this is
optimized to vector operations:
define void @_Z4initv() local_unnamed_addr #0 {
entry:
  %bf.load = load <2 x i48>, <2 x i48>* bitcast (%struct.struct_1* @s1 to <2 x
i48>*), align 8
  %bf.clear = and <2 x i48> %bf.load, <i48 -8796091973633, i48 -2198956146689>
  %bf.set3 = or <2 x i48> %bf.clear, <i48 7326889148416, i48 822419128320>
  store <2 x i48> %bf.set3, <2 x i48>* bitcast (%struct.struct_1* @s1 to <2 x
i48>*), align 8
  ret void
}

This seems legal, but it leads to incorrect code generation. More specifically,
instead of two *consequent* 48 bit stores, stores happen with 16 bit gap.
Good:
    movl    %ecx, s1(%rip)
    movw    %cx, s1+4(%rip)
    movl    %ecx, s1+6(%rip)
    movw    %cx, s1+10(%rip)
Bad:
    movq    %xmm1, s1(%rip)
    movq    %xmm0, s1+8(%rip)

So the problem is in different meaning of <2 x i48> - in LLVM IR consequent 48
bit locations are assumed, while code generation assumes 64 bit alignment. My
guess that it's code gen bug.

To reproduce:
>clang++ func.cpp test.cpp -o out -fslp-vectorize-aggressive -O2

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20160804/a475f76d/attachment.html>