[llvm-commits] CVS: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp LegalizeDAG.cpp SelectionDAG.cpp SelectionDAGISel.cpp TargetLowering.cpp

Tue Jul 3 15:07:26 PDT 2007

On Jul 3, 2007, at 2:13 PM, Dan Gohman wrote:

>>> We overload ISD::FADD and quite a lot of others. Why not
>>> ISD::ConstantFP too?
>>
>> Fair enough, after pondering on it, I agree with you.  The proposed
>> semantics are that a ConstantFP (and also a normal Constant?) produce
>> the splatted immediate value?
>
> Constant sounds good too. And UNDEF, for that matter. And yes,  
> that's the
> semantics I mean.

Ok, makes sense.  I think we already use UNDEF for vectors.

>> Please add a dag combine xform from build_vector [c,c,c,c] ->
>> constantfp and friends.
>
> I sketched out some of the code for this. One question that's come  
> up so far is
> whether if the vector has some undef elements but all the non-undef  
> elements
> are equal it should still be folded. My initial preference is to  
> still fold it,
> since that lets things like isBuildVectorAllZeros become trivial to  
> unnecessary,
> but it is a pessimization in some obscure cases.

I'm not sure about it.  One specific issue is with shuffle masks,  
which we want to retain the undef element values for.  I don't think  
there is a good way to retain shuffle masks but not other build  
vectors, so we probably need to keep the individual undef elements in  
it.

isBuildVectorAllZeros and friends are another issue.  To me there are  
actually two issues that should be resolved at some point:

1. vector constant and shuffle mask matching code is crazily complex,  
particularly in the x86 backend.  For vector constants, this is only  
slightly annoying.  For vector shuffle masks, the selected shuffles  
are currently whatever is best for yonah, and it's not really  
possible to prefer different shuffles on different subtargets.  We  
really want to add a layer of abstraction in the shuffle/constant  
matching code, which would make the undef handling stuff happen  
implicitly.  Making a more declarative description of the various  
masks would make it much easier to maintain, understand, and debug.

2. the x86 backend specifically has a problem with the way it selects  
vector constants (I think this is in the readme).  In particular, if  
you have a 4 x f32 and a 4 x i32 zero vector, you'll get two  
different pxor instructions, because they are of different type.   
There are two different ways to solve this problem:

The easy answer is to do what the ppc backend does.  It always  
selects zero (and -1) vectors to 4 x i32 IIRC, and then does a  
bitcast to the desired type if needed.  This ensures that the  
constant vectors always get CSEd.  The tricky part of this is to  
ensure that the 0/-1 vectors still get folded if you have operations  
(like ~) that require one of these as an operand.  This ugliness is  
why we have "vnot" and "vnot_conv" and have to duplicate patterns.

The better fix is to change the way the select phase produces code.   
In particular, the reason these two zero vectors don't get CSE'd  
after selection is because they have two different value types, and  
the autocse stuff doesn't "know" that the two VTs end up in the same  
register class.

To solve this, it seems like we can add a new MVT type, where a  
certain range of MVTs (128-255?) correspond to register class ID's.   
At selection time, instead of giving the new nodes their old MVT's,  
they would get new MVT's that correspond to the regclass of the  
result (ok, we'd keep MVT::Other, MVT::Flag and maybe some others).   
This makes the scheduler slightly simpler (because it doesn't need to  
map MVT -> regclass) anymore, and opens up future possibilities.

In particular, it lets us fix a long-standing class of issues where  
we can't have fp stack and SSE registers around at the same time,  
both with MVT::f32 or f64 type.  The current scheduler can only map  
f32 to one register class (thus, it can't keep the distinction) but  
with this change the select pass can pick any regclass it wants.

Anyway, this is a bit of a crazy tangent, but I think undef's in  
buildvector should probably stay :)

-Chris