<html><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Hi Duncan,<div><br><div><div>On Nov 20, 2008, at 1:06 AM, Duncan Sands wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div>Hi Mon Ping,<br><br><blockquote type="cite">@@ -233,6 +233,11 @@<br></blockquote><blockquote type="cite">                        JoinIntegers(Lo, Hi));<br></blockquote><blockquote type="cite">     return DAG.getNode(ISD::BIT_CONVERT, OutVT, InOp);<br></blockquote><blockquote type="cite">   }<br></blockquote><blockquote type="cite">+  case WidenVector:<br></blockquote><blockquote type="cite">+    if (OutVT.bitsEq(TLI.getWidenVectorType(InVT)))<br></blockquote><br>You should be able to use NInVT rather than TLI.getWidenVectorType.<br>That is because TLI.getTypeToTransformTo should return the same thing<br>as TLI.getWidenVectorType (I think TLI.getWidenVectorType should be<br>eliminated; I intend to say more about this in a later email).</div></blockquote><div><br></div><div>Given that for any given type, it should be clear if that type is legal or should be promoted, expanded, or widened, it make sense to have only one function for this.  The only difference in getWidenVectorType is that the function is virtual to allow a client to decide that for certain vectors, though a legal type exist, that it should not widen to that type because it might be cheaper to scalarize it.  There is no reason that getTypeToTransform can't do the same thing if certain target wants to override the default.  Today, without any firm data, the difference between X86 and the default is so slight, I'm inclined to have one function and add the virtual one when we see a need.</div><br><blockquote type="cite"><div><br><br><blockquote type="cite">+    case ISD::SCALAR_TO_VECTOR: Res = ExpandIntOp_SCALAR_TO_VECTOR(N); break;<br></blockquote>...<br><blockquote type="cite">+SDValue DAGTypeLegalizer::ExpandIntOp_SCALAR_TO_VECTOR(SDNode *N) {<br></blockquote><blockquote type="cite">+  // Create a vector sized/aligned stack slot, store the value to element #0,<br></blockquote><blockquote type="cite">+  // then load the whole vector back out.<br></blockquote><blockquote type="cite">+  return CreateStackStoreLoad(N->getOperand(0), N->getValueType(0));<br></blockquote><blockquote type="cite">+}<br></blockquote><br>It seems a pity to just give up and use a stack store load.  Maybe there<br>are some tricks...  How about using BUILD_VECTOR on the expanded pieces?<br><br></div></blockquote><div><br></div>You are right.  We should use BUILD_VECTOR like we do in widening.<br><br><blockquote type="cite"><div><blockquote type="cite">+  case ISD::AND:<br></blockquote><blockquote type="cite">+  case ISD::OR:<br></blockquote><blockquote type="cite">+  case ISD::XOR:<br></blockquote><blockquote type="cite">+  case ISD::ADD:<br></blockquote><blockquote type="cite">+  case ISD::SUB:<br></blockquote><blockquote type="cite">+  case ISD::FPOW:<br></blockquote><blockquote type="cite">+  case ISD::FPOWI: <br></blockquote><blockquote type="cite">+  case ISD::MUL:<br></blockquote><blockquote type="cite">+  case ISD::MULHS:<br></blockquote><blockquote type="cite">+  case ISD::MULHU:<br></blockquote><blockquote type="cite">+  case ISD::FADD:<br></blockquote><blockquote type="cite">+  case ISD::FSUB:<br></blockquote><blockquote type="cite">+  case ISD::FMUL:<br></blockquote><blockquote type="cite">+  case ISD::SDIV:<br></blockquote><blockquote type="cite">+  case ISD::SREM:<br></blockquote><blockquote type="cite">+  case ISD::FDIV:<br></blockquote><blockquote type="cite">+  case ISD::FREM:<br></blockquote><blockquote type="cite">+  case ISD::FCOPYSIGN:<br></blockquote><blockquote type="cite">+  case ISD::UDIV:<br></blockquote><blockquote type="cite">+  case ISD::UREM:<br></blockquote><blockquote type="cite">+  case ISD::BSWAP:             Res = WidenVecRes_Binary(N); break;<br></blockquote><br>Can you please put these in alphabetical order.<br><br></div></blockquote><div><br></div>Sure.</div><div><br><blockquote type="cite"><div><blockquote type="cite">+SDValue DAGTypeLegalizer::WidenVecRes_Binary(SDNode *N) {<br></blockquote><blockquote type="cite">+  // Binary op widening<br></blockquote><br>Missing full stop at end of comment.<br><br><blockquote type="cite">+  MVT WidenVT = TLI.getWidenVectorType(N->getValueType(0));<br></blockquote><blockquote type="cite">+  SDValue InOp1 = GetWidenedVector(N->getOperand(0));<br></blockquote><blockquote type="cite">+  SDValue InOp2 = GetWidenedVector(N->getOperand(1));<br></blockquote><blockquote type="cite">+  assert(InOp1.getValueType() == WidenVT && InOp2.getValueType() == WidenVT);<br></blockquote><blockquote type="cite">+  return DAG.getNode(N->getOpcode(), WidenVT, InOp1, InOp2);<br></blockquote><br>Please eliminate this assert.  If you want to do this kind of<br>checking, best to do it systematically in SetWidenedVector.<br>WidenVT can be obtained as the type of InOp1.<br><br></div></blockquote><div><br></div><div>I'll remove them.  </div><br><blockquote type="cite"><div><blockquote type="cite">+SDValue DAGTypeLegalizer::WidenVecRes_Convert(SDNode *N) {<br></blockquote><br>This routine is identical to WidenVecRes_Unary (except for<br>the asserts).  These conversions are just unary operations,<br>so please use WidenVecRes_Unary.  See below for some more<br>comments.<br><br><blockquote type="cite"><br></blockquote></div></blockquote><div><br></div><div>It was originally similar to bit convert where it checks if the incoming vectors will be widened to the same number of as the resulting widen vector type and if so, use the widen results; otherwise do the convert and then widen.  I simplified it to see when it will fails but it hasn't yet but I'm sure its because I haven't written some vector converts between types of different sizes which needs to part of the basic tests for the checkin.  I'm planning to change how it operates to handle the more complicated case.</div><div><br></div><blockquote type="cite"><div><blockquote type="cite">+SDValue DAGTypeLegalizer::WidenVecRes_Shift(SDNode *N) {<br></blockquote><blockquote type="cite">+  // Unary op widening<br></blockquote><blockquote type="cite">+  MVT WidenVT = TLI.getWidenVectorType(N->getValueType(0));<br></blockquote><blockquote type="cite">+  SDValue InOp = GetWidenedVector(N->getOperand(0));<br></blockquote><blockquote type="cite">+  assert(InOp.getValueType() == WidenVT);<br></blockquote><br>I think you should drop this assertion.  It can only fail if<br>the original node was invalid (SelectionDAG.cpp should check<br>that the two types are the same - if it doesn't then that<br>should be fixed).  That also means that the creation of the<br>new node will assert if the types are different.<br>You can get WidenVT as the type of InOp (more efficient).<br><br></div></blockquote><div><br></div><div>I agree.</div><br><blockquote type="cite"><div><blockquote type="cite">+SDValue DAGTypeLegalizer::WidenVecRes_Unary(SDNode *N) {<br></blockquote><blockquote type="cite">+  // Unary op widening<br></blockquote><blockquote type="cite">+  MVT WidenVT = TLI.getWidenVectorType(N->getValueType(0));<br></blockquote><blockquote type="cite">+  SDValue InOp = GetWidenedVector(N->getOperand(0));<br></blockquote><blockquote type="cite">+  assert(InOp.getValueType() == WidenVT);<br></blockquote><br>I think you should allow for the case when the operand and<br>result have different vector types.  Like FP_TO_UINT.  So<br>you should have a more general check here. i.e. I think you<br>should fold WidenVecRes_Convert into this routine.<br><br>By the way, notice a logical consequence: for this routine<br>to always work, you more or less have to require **that whether<br>a vector is widened or not, and to what width it is widened<br>depends only on the number of elements and not the element<br>type**.  So I think it is important to decide whether you<br>want to require this property or not.  If not, then the logic<br>for this routine needs to be made more complicated.  If you<br>do want to require this property (I think this would be wise)<br>then probably it should be built in fundamentally, in places<br>like TargetLowering.  I mean that the underlying routines<br>should only be passed the number of vector elements, and not<br>the element type - that way the above property is necessarily<br>satisfied.<br><br></div></blockquote><div><br></div><div>When there is a legal type to widen to, we should always widen to that type so we can do the associated operation on the wider legal type.  Given that different types might widen different  (e.g., v4i32 and v16i8 are legal types on X86), we must be able to handle the case when the two different types might widen to different lengths.</div><div><br></div><div>I think that I want to keep two separate classes of routines.  The function whose operands and result type must match can have the simple logic like in unary and binary.  For routines like conversions, we will have more complicated logic where we check if the input and output are going to have the same number of elements and if not, we have to do the right thing.</div><div><br></div><br><blockquote type="cite"><div><blockquote type="cite">+SDValue DAGTypeLegalizer::WidenVecRes_BIT_CONVERT(SDNode *N) {<br></blockquote><blockquote type="cite">+  SDValue InOp = N->getOperand(0);<br></blockquote><blockquote type="cite">+  MVT InVT = InOp.getValueType();<br></blockquote><blockquote type="cite">+  MVT VT = N->getValueType(0);<br></blockquote><blockquote type="cite">+  MVT WidenVT = TLI.getWidenVectorType(VT);<br></blockquote><br>Please use TLI.getTypeToTransformTo.  This keeps everything<br>uniform with other legalization methods.<br><br></div></blockquote><div><br></div><div>I'll will do so and also change LegalizeDAG to use this function as well so I can get rid of it and we can start thinking of removing widening from LegalizeDAG.</div><br><blockquote type="cite"><div><blockquote type="cite">+  case WidenVector:<br></blockquote><blockquote type="cite">+    if (WidenVT.bitsEq(TLI.getWidenVectorType(InVT)))<br></blockquote><br>Likewise (I won't mention this again).<br><br><blockquote type="cite">+  // Otherwise, do the convert and then widen.<br></blockquote><blockquote type="cite">+  SmallVector<SDValue, 16> Ops;<br></blockquote><blockquote type="cite">+  if (WidenVT.getSizeInBits() % VT.getSizeInBits()) {<br></blockquote><br>Isn't this test inverted?</div></blockquote><div><br></div><div> I looked in my build that I was testing and there is a "== 0"</div></div><div><br></div><div><blockquote type="cite"><div><br><br><blockquote type="cite">+    unsigned NumConcat = WidenVT.getSizeInBits() / VT.getSizeInBits();<br></blockquote><blockquote type="cite">+    SDValue UndefVal = DAG.getNode(ISD::UNDEF, VT);<br></blockquote><blockquote type="cite">+    Ops.push_back(SDValue(N,0));<br></blockquote><br>^ Isn't this going to lead to an infinite type legalization loop?<br>You can just reuse the node in a different context, you have to<br>change it, you have to make progress towards legalization of N.<br><br></div></blockquote><div><br></div><div>I looked into some test cases where I thought I would hit that case for BIT_CONVERT but I don't see it on X86, which make sense after I thought about it. When we widen to a legal type, what happens is that both sides are widen to the same vector register so the end result is that both sides have the same size.  In cases were we extend a vector to a power of 2, both vectors are extended to the next power of 2 and then also end up to be the same size so we never hit this case.</div><div><br></div><blockquote type="cite"><div><blockquote type="cite">+    SDValue CvtOp = SDValue(N,0);<br></blockquote><br>Likewise.  I guess I must be confused since you surely tested this,<br>but I don't see how this can possibly work.  Anyway, if I'm not<br>confused then I think you should do the following:<br><br>case (a).  The input is a scalar (i.e. not a vector).  Try to use<br>scalar_to_vector to build a vector with the same bitwidth as WidenVT<br>out of InOp.  Return a bitconvert of this vector to the WidenVT.<br>This may not always be possible.  If not, you may have to go to the<br>stack, see case (b) below.<br><br>case (b).  The input is a vector.  Here is where an "extend<br>by undef" operation for vectors would be extremely useful<br>(analogue of scalar_to_vector) :)  Lacking that, I suggest<br>you create a stack temporary of the size of WidenVT, store<br>InOp to it, and load it out as WidenVT.  That said, there<br>are probably a few cases you can handle more optimally.<br></div></blockquote><div><br></div><div>Today, we extend by undef through either concatenate undefs to the input or do extract and build vector.  I would do that way for now to avoid doing the stack  temporary.  My concern with the stack temporary is that we would generate the store and load, eventually split the vector down to scalar operations, and never clean up the store and load (unless some phase after legalization will do store load elimination).</div><br><blockquote type="cite"><div><blockquote type="cite">+  SmallVector<SDValue, 16> NewOps(N->op_begin(), N->op_end());<br></blockquote><br>Since you know the final size, you can reserve space in the<br>vector, to avoid reallocations.</div></blockquote></div><div><blockquote type="cite"><div><font class="Apple-style-span" color="#000000"><br></font><br><blockquote type="cite">+SDValue DAGTypeLegalizer::WidenVecRes_CONCAT_VECTORS(SDNode *N) {<br></blockquote><blockquote type="cite">+  // We currently support concatenating a multiple of the incoming vector. We<br></blockquote><blockquote type="cite">+  // could easily support widening an arbitrary amount using extract/build but<br></blockquote><blockquote type="cite">+  // I want to see a use case before we do so.<br></blockquote><br>I'm not sure I understand this comment.  I guess what you are trying<br>to say is that v6i32 = CONCAT(v3i32, v3i32), widens to v8i32, is not<br>supported yet?</div></blockquote><div><br></div><div>Yes, I'm stating that we don't support those cases where the result is not a multiple of the input.</div><div><br></div><blockquote type="cite"><div><br><br><blockquote type="cite">+  SmallVector<SDValue, 16> Ops;<br></blockquote><br>You can reserve space.<br><br><blockquote type="cite">+  Ops.push_back(SDValue(N,0));<br></blockquote><br>Shouldn't you be pushing back the operands of N?!<br></div></blockquote><div><br></div><blockquote type="cite"><div><blockquote type="cite">+  for (unsigned i = 1; i != NumConcat; ++i) {<br></blockquote><blockquote type="cite">+    Ops.push_back(UndefVal);<br></blockquote><blockquote type="cite">+  }<br></blockquote><br>Shouldn't you be pushing back NumConcat - N->getNumOperands() - 1<br>copies of undef?  Also, the braces {} are redundant.<br><br></div></blockquote><div><br></div>I'll fix the above problems.  I'm in the process of writing a more white box tests to capture some of the corner cases that I  don't see when running the test.<div><br></div><br><blockquote type="cite"><div><blockquote type="cite">+SDValue DAGTypeLegalizer::WidenVecRes_CONVERT_RNDSAT(SDNode *N) {<br></blockquote></div></blockquote><div><br></div><blockquote type="cite"><div>I didn't review this function.<br><br>At this point I got tired and went to bed!  I will take a look at<br>the rest some other time.<br><br></div></blockquote><div><br></div><div>The patch is very long and I appreciate the time and effort for the review.</div><div><br></div><div>Thanks,</div><div>  -- Mon Ping</div><div><br></div></div><br></div></body></html>