[llvm-commits] [llvm] r58964 - in /llvm/trunk: docs/LangRef.html lib/Bitcode/Reader/BitcodeReader.cpp lib/CodeGen/SelectionDAG/DAGCombiner.cpp lib/CodeGen/SelectionDAG/LegalizeDAG.cpp lib/CodeGen/SelectionDAG/LegalizeTypes.h lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp lib/CodeGen/SelectionDAG/SelectionDAG.cpp lib/CodeGen/SelectionDAG/SelectionDAGBuild.cpp lib/Transforms/Scalar/InstructionCombining.cpp lib/VMCore/ConstantFold.cpp lib/VMCore/Instructions.cpp lib/VMCore/Verifier.cpp
Mon Ping Wang
wangmp at apple.com
Fri Nov 14 00:04:12 PST 2008
Hi Duncan,
The code is definitely nicer if we can avoid duplicating the logic for
the BuildOps and if the mask indices are uniformly random, I would
agree that this way would be the best way to go. However, my
intuition is that the mask indices will not be random. One of the
main motivation of supporting the generalized vector shuffle is to
maintain the structure of the vector program coming in. The most
common case that I saw for the general vector shuffle is to rip apart
a larger vector into legal smaller vectors, manipulate them, and
recombined them again. For example, if one is doing a 16 x float
transpose on x86, one could rip apart the four 4 x floats, do about 6
unpcklps and 2 unpckhps, and recombine them.
As an experiment, I ran the compiler on a ~32,000 line vector program
generating code for X86 SSE4 and dump how often we split and when we
use split vectors instead of using the BuildOps.
vec_length:16 total_splits:16200 use_split: 16200
vec_length: 8 total_splits:22687 use_split: 22687
We don't see any splits of vector less than 4 because 4 is a legal
vector length and any vector length less than 4 is widen to 4. In this
limited experiment, we never use the BuildOps. I don't claim that this
is a typical vector program as it was written by an excellent vector
programmer but I don't think it is an unusual case. Good vector
programmers would avoid using weird shuffles that don't exist for an
architecture that they are targeting so we don't often split in a way
where we use the buildOps. Some food for thought.
Cheers,
-- Mon Ping
On Nov 13, 2008, at 1:16 AM, Duncan Sands wrote:
> Hi Mon Ping,
>
>> I like this implementation in general and you capture a case that I
>> missed (in avoid using build vector) :-> . In my thinking (which is
>> why I went more for a prescan methodology), the typical case is when
>> we can use split vectors for the new shuffle. So pre splitting the
>> vectors seems fine as we will using some of the result of the
>> presplit
>> in the vector shuffle. I don't particular like prebuilding the
>> BuildOps though because we build a set of extract element nodes that
>> we will typically throw away. It seems a little cleaner to me to
>> iterate through the mask again and build these nodes when we need
>> them. What do you think?
>
> I don't much like this either, yet it seems a pity to duplicate the
> logic, most of which is the same. I decided to do it this way after
> a small analysis of the probability of being able to make a vector
> shuffle assuming that the original mask indices are uniformly randomly
> distributed. If the vector being split has length 2 or 4 then a
> vector
> shuffle is always used. However when splitting a vector of length
> 8, I
> calculate that there is only about one chance in three that you can
> construct a vector shuffle. For length 16 this drops to one chance in
> 50. Thus making build vector elements is only likely to be a waste of
> time for vectors of length 2 or 4. In these cases you construct
> either
> 2 or 4 pointless nodes. This is not a big cost.
>
> Ciao,
>
> Duncan.
More information about the llvm-commits
mailing list