[LLVMbugs] [Bug 11266] New: Inefficient x86 vector code generation for add v16i8;

Sun Oct 30 00:08:04 PDT 2011

show_bug.cgi?id=11266

             Bug #: 11266
           Summary: Inefficient x86 vector code generation for add v16i8;
           Product: libraries
           Version: trunk
          Platform: PC
        OS/Version: Windows NT
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Backend: X86
        AssignedTo: unassignedbugs at nondot.org
        ReportedBy: nadav.rotem at intel.com
                CC: llvmbugs at cs.uiuc.edu
    Classification: Unclassified

>From the email exchange between Andrew and Chris:

Consider the following function which doubles a <16 x i8> vector:
> 
> define <16 x i8> @test(<16 x i8> %a) {
>       %b = add <16 x i8> %a, %a
>       ret <16 x i8> %b
> }
> 
> If I compile it for x86 with llc like so:
> 
> llc paddb.ll -filetype=asm -o=/dev/stdout
> 
> I get a two-op function that just does paddb %xmm0 %xmm0 and then 
> returns. llc does this regardless of the optimization level. Great!
> 
> If I let the instcombine pass touch it like so:
> 
> opt -instcombine paddb.ll |  llc -filetype=asm -o=/dev/stdout
> 
> or like so:
> 
> opt -O3 paddb.ll |  llc -filetype=asm -o=/dev/stdout
> 
> then the add gets converted to a vector left shift by 1, which then 
> lowers to a much slower function with about a hundred ops. No amount 
> of optimization after the fact will simplify it back to paddy.

This sounds like a really serious X86 backend performance bug.  Canonicalizing
"x+x" to a shift is the "right thing to do", the backend should match it.

-- 
Configure bugmail: userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.