[LLVMbugs] [Bug 8429] New: vectorized udiv/urem with constant pot-divisor are scalarized
bugzilla-daemon at llvm.org
bugzilla-daemon at llvm.org
Thu Oct 21 08:34:42 PDT 2010
http://llvm.org/bugs/show_bug.cgi?id=8429
Summary: vectorized udiv/urem with constant pot-divisor are
scalarized
Product: libraries
Version: 2.8
Platform: PC
OS/Version: Linux
Status: NEW
Severity: enhancement
Priority: P
Component: Backend: X86
AssignedTo: unassignedbugs at nondot.org
ReportedBy: sroland at vmware.com
CC: llvmbugs at cs.uiuc.edu
Consider this function:
define <4 x i32> @udiv_vec(<4 x i32> %var) {
entry:
%0 = udiv <4 x i32> %var, <i32 16, i32 16, i32 16, i32 16>
ret <4 x i32> %0
}
llvm 2.8 produces this on x86_64 (and sse41 - with only sse2 it gets worse due
to the lack of pextrd):
pextrd $1, %xmm0, %eax
shrl $4, %eax
movd %xmm0, %ecx
shrl $4, %ecx
movd %ecx, %xmm1
pinsrd $1, %eax, %xmm1
pextrd $2, %xmm0, %eax
shrl $4, %eax
pinsrd $2, %eax, %xmm1
pextrd $3, %xmm0, %eax
shrl $4, %eax
movdqa %xmm1, %xmm0
pinsrd $3, %eax, %xmm0
ret
But, if the divisor is not only a power of two, but the same for all 4 values,
as is the case here, obviously this would be preferred:
psrld $4, %xmm0
ret
The same applies to urem (though this one also would require loading the mask
constant to xmm). I guess the same applies to <8 x i16> values (though I did
not test that) and <16 x i8> - though due to the lack of byte shifts this would
require some more work, but in any case I think it would be much cheaper than
doing extract/shift/insert for each of the 16 elements individually...
--
Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
More information about the llvm-bugs
mailing list