[LLVMbugs] [Bug 12175] New: Optimizing add <16 x i8> %x, %x to shl is suboptimal with SSE

Sat Mar 3 14:07:39 PST 2012

http://llvm.org/bugs/show_bug.cgi?id=12175

             Bug #: 12175
           Summary: Optimizing add <16 x i8> %x,%x to shl is suboptimal
                    with SSE
           Product: new-bugs
           Version: 3.0
          Platform: PC
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: new bugs
        AssignedTo: unassignedbugs at nondot.org
        ReportedBy: arcata at gmail.com
                CC: llvmbugs at cs.uiuc.edu
    Classification: Unclassified

`opt -O2 -S` transforms this IR:

--
define <16 x i8> @foo(<16 x i8> %x) nounwind {
    %y = add <16 x i8> %x, %x
    ret <16 x i8> %y
}
--

into this IR:

--
define <16 x i8> @foo(<16 x i8> %x) nounwind readnone {
  %y = shl <16 x i8> %x, <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1,
i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
  ret <16 x i8> %y
}
--

This is a bad idea with SSE, because the lowering for `shl <16 x i8>` is rather
involved:

--
_foo:                                   ## @foo
## BB#0:
    movdqa    %xmm0, %xmm1
    movdqa    LCPI0_0(%rip), %xmm0
    psllw    $5, %xmm0
    movdqa    LCPI0_1(%rip), %xmm2
    pand    %xmm1, %xmm2
    psllw    $4, %xmm2
    pblendvb    %xmm0, %xmm2, %xmm1
    movdqa    LCPI0_2(%rip), %xmm2
    pand    %xmm1, %xmm2
    psllw    $2, %xmm2
    paddb    %xmm0, %xmm0
    pblendvb    %xmm0, %xmm2, %xmm1
    movdqa    %xmm1, %xmm2
    paddb    %xmm2, %xmm2
    paddb    %xmm0, %xmm0
    pblendvb    %xmm0, %xmm2, %xmm1
    movdqa    %xmm1, %xmm0
    ret
--

The unoptimized IR by comparison can be lowered to a single instruction:

--
_foo:                                   ## @foo
## BB#0:
    paddb    %xmm0, %xmm0
    ret
--

-- 
Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.