[LLVMdev] Lowering to MMX
Bruno Cardoso Lopes
bruno.cardoso at gmail.com
Mon Oct 24 18:30:43 PDT 2011
> I found out that the performance regression is due to removing support
> for lowering 64-bit vector operations to MMX, and using SSE2 instead. My
> code uses a mix of MMX intrinsics and v4i16 operations, so it ping-pongs
> back and forth between MMX and SSE2 instructions in the generated code.
> To get more optimal code, I see three options, and I was wondering if
> someone could share some advice on which approach you think will work best:
> 1) I could use v8i16 or v4i32 instead of v4i16, but then the SSE
> register pressure would be significantly increased. I already use v4f32
> operations intensively so having the MMX registers available for 64-bit
> integer vector operations helps performance quite considerably on the
> register deprived x86 architecture. There's little to no opportunity for
> using v8i16 to perform two v4i16 operations simultaneously so that won't
> make up for the added register pressure. So I'm not keen to implement
> this option, unless anyone sees some advantages that I missed?
> 2) Since I use MMX intrinsics, I take care of inserting the appropriate
> EMMS instructions myself as well. So it's absolutely fine to have LLVM
> lower 64-bit operations into MMX instructions (the way it used to be in
> LLVM 2.8). Would it be straightforward to re-enable this? I noticed that
> revision 115243 removes the MMX lowering rules, but I don't know if the
> rest of LLVM 3.0 would still support them if I simply reverted them.
> Please note that I'm not an LLVM expert and I'd prefer not having to
> maintain local changes. Would there be any objection to having an
> 'EnableMMX' flag (false by default)?
> 3) I believe all MMX instructions are available as intrinsics now? That
> would allow me to replace all straight LLVM operations with intrinsics.
> I'm just wondering what the downsides of that would be? I assume I won't
> get any benefits from instruction combining, but things like dead code
> elimination still work?
AFAIK, the only way to get MMX instructions now is using the MMX
intrinsics with the new defined MMX specific vector types! So, if
you're really getting register pressure on 1), I would go for 3).
Bruno Cardoso Lopes
More information about the llvm-dev