[llvm-commits] [llvm] r153848 - in /llvm/trunk: lib/CodeGen/SelectionDAG/DAGCombiner.cpp lib/Target/X86/X86ISelLowering.cpp test/CodeGen/ARM/reg_sequence.ll test/CodeGen/CellSPU/rotate_ops.ll test/CodeGen/X86/2011-10-27-tstore.ll test/CodeGen/X86

Fri Apr 6 13:12:05 PDT 2012

On Apr 5, 2012, at 3:21 PM, Eli Friedman wrote:

> 2012/4/2 Rotem, Nadav <nadav.rotem at intel.com>:
>> Hi Eli,
>> 
>> Thanks for reviewing the patch.  I understand your comment and I agree that in some cases the user may generate "good" shuffles that this optimization may turn into complex shuffles, for which we generate poor code.  After reading your comment I tried to mitigate this problem by checking that the original shuffle node has a single user.  I think that users that want to generate exact shuffle instructions should use intrinsics.  Do you think that we should not optimize shuffles in the DagComb and in InstComb , or maybe we should only optimize for known patterns ?
> 
> (Sorry about the delay; I meant to send this sooner.)
> 
> Traditionally, we've assumed that a shufflevector written in IR is
> likely to be a legal shuffle, and tried to avoid breaking them because
> our handling of illegal shuffles is less than ideal.  So instcombine,
> for example, is quite conservative in its shuffle handling.  Not sure
> how much that applies to the transformation you're adding here.

Just to reemphasize, this is a real issue.  PowerPC Altivec is a great examplar of why this matters: it has a bunch of special case single-cycle shuffles to do a variety of important special case shuffles.  If you merge together two shuffles, it will often not figure out how to generate the two single cycle shuffles that it came from, and then generate slower code.  X86 I'm sure has similar issues.

-Chris