[LLVMdev] Lowering to MMX
Nicolas Capens
nicolas.capens at gmail.com
Thu Oct 27 12:16:26 PDT 2011
On 26/10/2011 6:35 PM, Bill Wendling wrote:
> On Oct 26, 2011, at 1:18 PM, Nicolas Capens wrote:
>
>> I'm having one remaining issue though; I can't seem to generate the movd instruction(s) (moving 32-bits of data in and out of the lower half of an MMX registers). Take for example the following LLVM IR:
>>
>> define internal void @unpack(i8*, i8*) {
>> %3 = bitcast i8* %1 to i32*
>> %4 = load i32* %3, align 1
>> %5 = insertelement<2 x i32> undef, i32 %4, i32 0
>> %6 = bitcast<2 x i32> %5 to x86_mmx
>> %7 = call x86_mmx @llvm.x86.mmx.punpcklbw(x86_mmx %6, x86_mmx %6)
>> %8 = bitcast i8* %0 to x86_mmx*
>> store x86_mmx %7, x86_mmx* %8, align 1
>> ret void
>> }
>> declare x86_mmx @llvm.x86.mmx.punpcklbw(x86_mmx, x86_mmx) nounwind readnone
>>
>> Which gives me the following assembly code:
>>
>> push ebp
>> mov ebp,esp
>> and esp,0FFFFFFF0h
>> sub esp,20h
>> mov eax,dword ptr [ebp+0Ch]
>> movd xmm0,dword ptr [eax]
>> movapd xmmword ptr [esp],xmm0
>> movq mm0,mmword ptr [esp]
>> punpcklbw mm0,mm0
>> mov eax,dword ptr [ebp+8]
>> movq mmword ptr [eax],mm0
>> emms
>> mov esp,ebp
>> pop ebp
>> ret
>>
>> The inner portion could look like this instead:
>>
>> movd mm0,dword ptr [eax]
>> punpcklbw mm0,mm0
>>
>> Should I be using other IR operations to get this result, or are the matching patterns missing? Or would it perhaps be best to make movd available as an intrinsic as well (note that it has four varieties for MMX)?
> I don't think it's a missing pattern. I think it's the backend trying to use the best instructions available. I get this if I turn off SSE:
>
> [Irk:llvm] llc -o - t.ll -mattr=-sse,+mmx -O3 -x86-asm-syntax=intel
> .section __TEXT,__text,regular,pure_instructions
> .align 4, 0x90
> _unpack: ## @unpack
> Ltmp0:
> .cfi_startproc
> ## BB#0:
> mov ECX, DWORD PTR [RSI]
> shl RAX, 32
> or RAX, RCX
> movd MM0, RAX
> punpcklbw MM0, MM0
> movq QWORD PTR [RDI], MM0
> ret
> Ltmp1:
> .cfi_endproc
> Leh_func_end0:
>
That's interesting. It means that somewhere along the way the v2i32
insert gets promoted into an v4i32 insert because it's assumed to be
better. Perhaps it can be detected that it gets consumed by an MMX
intrinsic (after the bitcast) so this promotion isn't performed. Do you
happen to know what parts of code I could look through to change this
behavior?
Thanks,
Nicolas
More information about the llvm-dev
mailing list