[LLVMdev] instcombine does silly things with vector x+x

Rotem, Nadav nadav.rotem at intel.com
Sun Oct 30 00:12:47 PDT 2011


Opened pr11266. I will try to make time to work on it.


-----Original Message-----
From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Chris Lattner
Sent: Saturday, October 29, 2011 01:04
To: andrew adams
Cc: llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] instcombine does silly things with vector x+x


On Oct 28, 2011, at 2:13 PM, andrew adams wrote:

> Consider the following function which doubles a <16 x i8> vector:
> 
> define <16 x i8> @test(<16 x i8> %a) {
>       %b = add <16 x i8> %a, %a
>       ret <16 x i8> %b
> }
> 
> If I compile it for x86 with llc like so:
> 
> llc paddb.ll -filetype=asm -o=/dev/stdout
> 
> I get a two-op function that just does paddb %xmm0 %xmm0 and then
> returns. llc does this regardless of the optimization level. Great!
> 
> If I let the instcombine pass touch it like so:
> 
> opt -instcombine paddb.ll |  llc -filetype=asm -o=/dev/stdout
> 
> or like so:
> 
> opt -O3 paddb.ll |  llc -filetype=asm -o=/dev/stdout
> 
> then the add gets converted to a vector left shift by 1, which then
> lowers to a much slower function with about a hundred ops. No amount
> of optimization after the fact will simplify it back to paddy.

This sounds like a really serious X86 backend performance bug.  Canonicalizing "x+x" to a shift is the "right thing to do", the backend should match it. 

-Chris

> 
> I'm actually generating these ops in a JIT context, and I want to use
> instcombine, as it seems like a useful pass. Any idea how I can
> reliably generate the 128-bit sse version of paddb? I thought I might
> be able to force the issue with an intrinsic, but there only seems to
> be an intrinsic for the 64 bit version (llvm.x86.mmx.padd.b), and the
> saturating 128 bit version (llvm.x86.sse2.padds.b). I would just give
> up and use inline assembly, but it seems I can't JIT that.
> 
> I'm using the latest llvm 3.1 from svn. I get similar behavior at
> llvm.org/demo using the following equivalent C code:
> 
> #include <emmintrin.h>
> __m128i f(__m128i a) {
>  return _mm_add_epi8(a, a);
> }
> 
> The no-optimization compilation of this is better than the optimized version.
> 
> Any ideas? Should I just not use this pass?
> 
> - Andrew
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.





More information about the llvm-dev mailing list