[LLVMbugs] [Bug 2585] New: Unoptimal vector 'trunc' emulation (and crash)
bugzilla-daemon at cs.uiuc.edu
bugzilla-daemon at cs.uiuc.edu
Tue Jul 22 15:54:35 PDT 2008
http://llvm.org/bugs/show_bug.cgi?id=2585
Summary: Unoptimal vector 'trunc' emulation (and crash)
Product: new-bugs
Version: unspecified
Platform: PC
OS/Version: All
Status: NEW
Severity: normal
Priority: P2
Component: new bugs
AssignedTo: unassignedbugs at nondot.org
ReportedBy: nicolas at capens.net
CC: llvmbugs at cs.uiuc.edu
In an attempt to do an element-wise trunc of <4 x i32> to <4 x i16> I tried
using the following LLVM IR:
external constant <4 x i32> ; <<4 x i32>*>:0 [#uses=1]
external constant <4 x i16> ; <<4 x i16>*>:1 [#uses=1]
define internal void @""() {
load <4 x i32>* @0, align 16 ; <<4 x i32>>:1 [#uses=1]
bitcast <4 x i32> %1 to <8 x i16> ; <<8 x i16>>:2
[#uses=1]
shufflevector <8 x i16> %2, <8 x i16> undef, <8 x i32> < i32 0, i32 2,
i32 4, i32 6, i32 undef, i32 undef, i32 undef, i32 undef > ; <<8
x i16>>:3 [#uses=1]
bitcast <8 x i16> %3 to <2 x i64> ; <<2 x i64>>:4
[#uses=1]
extractelement <2 x i64> %4, i32 0 ; <i64>:5 [#uses=1]
bitcast i64 %5 to <4 x i16> ; <<4 x i16>>:6 [#uses=1]
store <4 x i16> %6, <4 x i16>* @1, align 8
ret void
}
Unfortunately there's a access vilation originating in
LowerVECTOR_SHUFFLEv8i16, due to the undefs in the shuffle mask. Using 02460246
instead gives me the following x86 code:
push ebp
mov ebp,esp
and esp,0FFFFFFF0h
sub esp,40h
movaps xmm0,xmmword ptr ds:[48C7700h]
pextrw eax,xmm0,4
movaps xmm1,xmm0
punpcklqdq xmm1,xmm1
pshuflw xmm1,xmm1,88h
pshufhw xmm1,xmm1,88h
pinsrw xmm1,eax,2
pextrw ecx,xmm0,6
pinsrw xmm1,ecx,3
pinsrw xmm1,eax,6
pinsrw xmm1,ecx,7
movaps xmmword ptr [esp],xmm1
mov eax,dword ptr [esp+4]
mov dword ptr [esp+1Ch],eax
mov eax,dword ptr [esp]
mov dword ptr [esp+18h],eax
movq mm0,mmword ptr [esp+18h]
movq mmword ptr ds:[48C76F8h],mm0
mov esp,ebp
pop ebp
ret
This is not very optimal. Instead I expected something like:
push ebp
mov ebp,esp
and esp,0FFFFFFF0h
sub esp,40h
movaps xmm0,xmmword ptr ds:[48C7700h]
pshuflw xmm0,xmm0,0x88
pshufhw xmm0,xmm0,0x88
pshufd xmm0,xmm0,0x88
movdq2q mm0,xmm0
movq mmword ptr ds:[48C76F8h],mm0
mov esp,ebp
pop ebp
ret
That's essentially 4 instead of 16 instructions!
Since a 'trunc' of <4 x i32> to <4 x i16> can be quite useful I think, it might
be worth it to have specialized codegen for this. Interestingly the
pshuflw+pshufhw already gets generated, but after that it fails to generate
pshufd+movdq2q.
--
Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
More information about the llvm-bugs
mailing list