[LLVMdev] Lowering to MMX

Tue Oct 25 09:24:06 PDT 2011

Thanks Bruno. I started replacing 64-bit vector operations with explicit 
MMX intrinsics, and the results look fairly promising so far.

On 24/10/2011 9:30 PM, Bruno Cardoso Lopes wrote:
> Hi Nicolas,
>
>> I found out that the performance regression is due to removing support
>> for lowering 64-bit vector operations to MMX, and using SSE2 instead. My
>> code uses a mix of MMX intrinsics and v4i16 operations, so it ping-pongs
>> back and forth between MMX and SSE2 instructions in the generated code.
>>
>> To get more optimal code, I see three options, and I was wondering if
>> someone could share some advice on which approach you think will work best:
>> 1) I could use v8i16 or v4i32 instead of v4i16, but then the SSE
>> register pressure would be significantly increased. I already use v4f32
>> operations intensively so having the MMX registers available for 64-bit
>> integer vector operations helps performance quite considerably on the
>> register deprived x86 architecture. There's little to no opportunity for
>> using v8i16 to perform two v4i16 operations simultaneously so that won't
>> make up for the added register pressure. So I'm not keen to implement
>> this option, unless anyone sees some advantages that I missed?
>> 2) Since I use MMX intrinsics, I take care of inserting the appropriate
>> EMMS instructions myself as well. So it's absolutely fine to have LLVM
>> lower 64-bit operations into MMX instructions (the way it used to be in
>> LLVM 2.8). Would it be straightforward to re-enable this? I noticed that
>> revision 115243 removes the MMX lowering rules, but I don't know if the
>> rest of LLVM 3.0 would still support them if I simply reverted them.
>> Please note that I'm not an LLVM expert and I'd prefer not having to
>> maintain local changes. Would there be any objection to having an
>> 'EnableMMX' flag (false by default)?
>> 3) I believe all MMX instructions are available as intrinsics now? That
>> would allow me to replace all straight LLVM operations with intrinsics.
>> I'm just wondering what the downsides of that would be? I assume I won't
>> get any benefits from instruction combining, but things like dead code
>> elimination still work?
> AFAIK, the only way to get MMX instructions now is using the MMX
> intrinsics with the new defined MMX specific vector types! So, if
> you're really getting register pressure on 1), I would go for 3).
>
>