[llvm-commits] [llvm] r58964 - in /llvm/trunk: docs/LangRef.html lib/Bitcode/Reader/BitcodeReader.cpp lib/CodeGen/SelectionDAG/DAGCombiner.cpp lib/CodeGen/SelectionDAG/LegalizeDAG.cpp lib/CodeGen/SelectionDAG/LegalizeTypes.h lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp lib/CodeGen/SelectionDAG/SelectionDAG.cpp lib/CodeGen/SelectionDAG/SelectionDAGBuild.cpp lib/Transforms/Scalar/InstructionCombining.cpp lib/VMCore/ConstantFold.cpp lib/VMCore/Instructions.cpp lib/VMCore/Verifier.cpp

Fri Nov 14 00:04:12 PST 2008

Hi Duncan,

The code is definitely nicer if we can avoid duplicating the logic for  
the BuildOps and if the mask indices are uniformly random, I would  
agree that this way would be the best way to go.  However, my  
intuition is that the mask indices will not be random.  One of the  
main motivation of supporting the generalized vector shuffle is to  
maintain the structure of the vector program coming in.  The most  
common case that I saw for the general vector shuffle is to rip apart  
a larger vector into legal smaller vectors, manipulate them, and  
recombined them again.  For example, if one is doing a 16 x float  
transpose on x86, one could rip apart the four 4 x floats,  do about 6  
unpcklps and  2 unpckhps, and recombine them.

As an experiment, I ran the compiler on a ~32,000 line vector program  
generating code for X86 SSE4 and dump how often we split and when we  
use split vectors instead of using the BuildOps.

vec_length:16 total_splits:16200 use_split: 16200
vec_length:  8 total_splits:22687 use_split: 22687

We don't see any splits of vector less than 4 because 4 is a legal  
vector length and any vector length less than 4 is widen to 4. In this  
limited experiment, we never use the BuildOps. I don't claim that this  
is a typical vector program as it was written by an excellent vector  
programmer but I don't think it is an unusual case. Good vector  
programmers would avoid using weird shuffles that don't exist for an  
architecture that they are targeting so we don't often split in a way  
where we use the buildOps.  Some food for thought.

Cheers,
-- Mon Ping

On Nov 13, 2008, at 1:16 AM, Duncan Sands wrote:

> Hi Mon Ping,
>
>> I like this implementation in general and you capture a case that I
>> missed (in avoid using build vector) :-> .   In my thinking (which is
>> why I went more for a prescan methodology), the typical case is when
>> we can use split vectors for the new shuffle.  So pre splitting the
>> vectors seems fine as we will using some of the result of the  
>> presplit
>> in the vector shuffle.  I don't particular like prebuilding the
>> BuildOps though because we build a set of extract element nodes that
>> we will typically throw away. It seems a little cleaner to me to
>> iterate through the mask again and build these nodes when we need
>> them.  What do you think?
>
> I don't much like this either, yet it seems a pity to duplicate the
> logic, most of which is the same.  I decided to do it this way after
> a small analysis of the probability of being able to make a vector
> shuffle assuming that the original mask indices are uniformly randomly
> distributed.  If the vector being split has length 2 or 4 then a  
> vector
> shuffle is always used.  However when splitting a vector of length  
> 8, I
> calculate that there is only about one chance in three that you can
> construct a vector shuffle.  For length 16 this drops to one chance in
> 50.  Thus making build vector elements is only likely to be a waste of
> time for vectors of length 2 or 4.  In these cases you construct  
> either
> 2 or 4 pointless nodes.  This is not a big cost.
>
> Ciao,
>
> Duncan.