[LLVMdev] LLVM 2.8 and MMX

Tue Sep 21 10:23:58 PDT 2010

Hi all,

Sorry for the late reply. I got sidetracked by other fun projects. ;-)

I found that the performance regression is caused by revisions 112804,
112805 and 112806. Those changes were made 2 days prior to the 2.8
branching, so it may have not been the intention to include them there?
Either way they make my vector-intensive code two times slower so it would
be much appreciated to revert these changes for the 2.8 release.

Thanks,

Nicolas

-----Original Message-----
From: Chris Lattner [mailto:clattner at apple.com] 
Sent: Wednesday, September 08, 2010 18:59
To: Eli Friedman
Cc: Nicolas Capens; llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] LLVM 2.8 and MMX

On Sep 8, 2010, at 7:24 AM, Eli Friedman wrote:

> On Wed, Sep 8, 2010 at 12:35 AM, Nicolas Capens
> <nicolas.capens at gmail.com> wrote:
>> Hi Chris,
>> 
>> It's not broken, but the performance is crippled.
>> 
>> I noticed that the code still contains some MMX instructions, but several
>> operations get expanded (apparently swizzling and such get expanded to a
>> large number of byte moves).
> 
> I think some changes related to MMX landed before 2.8 branched which
> shouldn't have... please file a bug.

Right.  There should be no major change before 2.8, so if something bad
happened, it needs to be fixed on the branch.

>> I could use intrinsics, but they wouldn't be optimized like other vector
>> operations. I could use SSE operations, but they would increase SSE
register
>> pressure while MMX registers are left unused.
>> 
>> So ideally I would like to inform LLVM that selecting MMX instructions is
>> fine. I'm inserting emms instructions in the right spots myself.
> 
> I think the direction going forward we're going to prefer is that
> 64-bit vectors get widened to 128-bit vectors, which might not be
> quite ideal in some situations, but will avoid situations where MMX
> instructions are incorrectly generated.  That said, the work isn't
> finished, so it shouldn't be in 2.8.

In 2.9, the only way to get MMX will be to use mmx intrinsics, generic
vectors will not map onto MMX, sorry Nicolas.  One major problem is that the
optimizer introduces generic vectors (e.g. see r112696 in the SRoA pass)
which use mmx where it was not previously used.  This means that your
frontend introducing emms is not enough.

-Chris