[LLVMbugs] [Bug 11266] New: Inefficient x86 vector code generation for add v16i8;
bugzilla-daemon at llvm.org
bugzilla-daemon at llvm.org
Sun Oct 30 00:08:04 PDT 2011
show_bug.cgi?id=11266
Bug #: 11266
Summary: Inefficient x86 vector code generation for add v16i8;
Product: libraries
Version: trunk
Platform: PC
OS/Version: Windows NT
Status: NEW
Severity: enhancement
Priority: P
Component: Backend: X86
AssignedTo: unassignedbugs at nondot.org
ReportedBy: nadav.rotem at intel.com
CC: llvmbugs at cs.uiuc.edu
Classification: Unclassified
>From the email exchange between Andrew and Chris:
Consider the following function which doubles a <16 x i8> vector:
>
> define <16 x i8> @test(<16 x i8> %a) {
> %b = add <16 x i8> %a, %a
> ret <16 x i8> %b
> }
>
> If I compile it for x86 with llc like so:
>
> llc paddb.ll -filetype=asm -o=/dev/stdout
>
> I get a two-op function that just does paddb %xmm0 %xmm0 and then
> returns. llc does this regardless of the optimization level. Great!
>
> If I let the instcombine pass touch it like so:
>
> opt -instcombine paddb.ll | llc -filetype=asm -o=/dev/stdout
>
> or like so:
>
> opt -O3 paddb.ll | llc -filetype=asm -o=/dev/stdout
>
> then the add gets converted to a vector left shift by 1, which then
> lowers to a much slower function with about a hundred ops. No amount
> of optimization after the fact will simplify it back to paddy.
This sounds like a really serious X86 backend performance bug. Canonicalizing
"x+x" to a shift is the "right thing to do", the backend should match it.
--
Configure bugmail: userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
More information about the llvm-bugs
mailing list