[llvm-commits] [llvm] r153848 - in /llvm/trunk: lib/CodeGen/SelectionDAG/DAGCombiner.cpp lib/Target/X86/X86ISelLowering.cpp test/CodeGen/ARM/reg_sequence.ll test/CodeGen/CellSPU/rotate_ops.ll test/CodeGen/X86/2011-10-27-tstore.ll test/CodeGen/X86

Sat Apr 7 14:43:00 PDT 2012

Hi Eli and Chris, 

I removed the part of 153848 that generated new shuffles and added a more restrictive optimization that only removes shuffles.

Thanks,
Nadav

-----Original Message-----
From: Chris Lattner [mailto:clattner at apple.com] 
Sent: Friday, April 06, 2012 23:12
To: Eli Friedman
Cc: Rotem, Nadav; llvm-commits at cs.uiuc.edu
Subject: Re: [llvm-commits] [llvm] r153848 - in /llvm/trunk: lib/CodeGen/SelectionDAG/DAGCombiner.cpp lib/Target/X86/X86ISelLowering.cpp test/CodeGen/ARM/reg_sequence.ll test/CodeGen/CellSPU/rotate_ops.ll test/CodeGen/X86/2011-10-27-tstore.ll test/CodeGen/X86

On Apr 5, 2012, at 3:21 PM, Eli Friedman wrote:

> 2012/4/2 Rotem, Nadav <nadav.rotem at intel.com>:
>> Hi Eli,
>> 
>> Thanks for reviewing the patch.  I understand your comment and I agree that in some cases the user may generate "good" shuffles that this optimization may turn into complex shuffles, for which we generate poor code.  After reading your comment I tried to mitigate this problem by checking that the original shuffle node has a single user.  I think that users that want to generate exact shuffle instructions should use intrinsics.  Do you think that we should not optimize shuffles in the DagComb and in InstComb , or maybe we should only optimize for known patterns ?
> 
> (Sorry about the delay; I meant to send this sooner.)
> 
> Traditionally, we've assumed that a shufflevector written in IR is 
> likely to be a legal shuffle, and tried to avoid breaking them because 
> our handling of illegal shuffles is less than ideal.  So instcombine, 
> for example, is quite conservative in its shuffle handling.  Not sure 
> how much that applies to the transformation you're adding here.

Just to reemphasize, this is a real issue.  PowerPC Altivec is a great examplar of why this matters: it has a bunch of special case single-cycle shuffles to do a variety of important special case shuffles.  If you merge together two shuffles, it will often not figure out how to generate the two single cycle shuffles that it came from, and then generate slower code.  X86 I'm sure has similar issues.

-Chris
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.