[LLVMbugs] [Bug 14657] New: Poor AVX code generation on 8xi1 masks

Wed Dec 19 14:15:19 PST 2012

http://llvm.org/bugs/show_bug.cgi?id=14657

             Bug #: 14657
           Summary: Poor AVX code generation on 8xi1 masks
           Product: libraries
           Version: trunk
          Platform: PC
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Backend: X86
        AssignedTo: unassignedbugs at nondot.org
        ReportedBy: nrotem at apple.com
                CC: llvmbugs at cs.uiuc.edu
    Classification: Unclassified

The loop below comes from the 'gcc-loops' benchmark and it is vectorized with
the Loop Vectorizer. Because of this problem we are 50% slower than GCC on this
loop.  

When compiled with LLC the instruction "%14=and<8 x i1>..." becomes an AND of
an XMM register. This is due to the way we type-legalize vectors. I think that
the best way to solve this problem is to implement an x86-specific dag-combine
pattern to handle these cases. 

define void @_Z9example25v() nounwind uwtable noinline ssp {
vector.ph:
  br label %vector.body

vector.body:                                      ; preds = %vector.body,
%vector.ph
  %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
  %0 = getelementptr inbounds [1024 x float]* @da, i64 0, i64 %index
  %1 = bitcast float* %0 to <8 x float>*
  %2 = load <8 x float>* %1, align 16
  %3 = getelementptr inbounds [1024 x float]* @db, i64 0, i64 %index
  %4 = bitcast float* %3 to <8 x float>*
  %5 = load <8 x float>* %4, align 16
  %6 = fcmp olt <8 x float> %2, %5
  %7 = getelementptr inbounds [1024 x float]* @dc, i64 0, i64 %index
  %8 = bitcast float* %7 to <8 x float>*
  %9 = load <8 x float>* %8, align 16
  %10 = getelementptr inbounds [1024 x float]* @dd, i64 0, i64 %index
  %11 = bitcast float* %10 to <8 x float>*
  %12 = load <8 x float>* %11, align 16
  %13 = fcmp olt <8 x float> %9, %12
  %14 = and <8 x i1> %6, %13
  %15 = zext <8 x i1> %14 to <8 x i32>
  %16 = getelementptr inbounds [1024 x i32]* @dj, i64 0, i64 %index
  %17 = bitcast i32* %16 to <8 x i32>*
  store <8 x i32> %15, <8 x i32>* %17, align 16
  %index.next = add i64 %index, 8
  %18 = icmp eq i64 %index.next, 1024
  br i1 %18, label %for.end, label %vector.body

for.end:                                          ; preds = %vector.body
  ret void
}

-- 
Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.