<div dir="ltr">Hi Wei,<div><br></div><div>I haven't worked on LLVM codegen for several years now, so you'll probably want to talk to someone who's been involved with it more recently. Sending to llvm-dev@ was the right call.</div><div>+llvm-dev@ back in.</div><div><br></div><div>Having said that:</div><div>Those are the semantics of the SHUFFLE_VECTOR node, so the code is doing the right thing. See ISDOpcodes.h:</div><div><br></div><div>    /// VECTOR_SHUFFLE(VEC1, VEC2) - Returns a vector, of the same type as<br>    /// VEC1/VEC2.  A VECTOR_SHUFFLE node also contains an array of constant int<br>    /// values that indicate which value (or undef) each result element will<br>    /// get.  These constant ints are accessible through the<br>    /// ShuffleVectorSDNode class.  This is quite similar to the Altivec<br>    /// 'vperm' instruction, except that the indices must be constants and are<br>    /// in terms of the element size of VEC1/VEC2, not in terms of bytes.</div><div><br></div><div>If you're asking why SHUFFLE_VECTOR was defined this way - I don't know. This goes back to the very early days of LLVM and SelectionDAG (see <a href="https://reviews.llvm.org/rL26879" target="_blank">https://reviews.llvm.org/rL26879</a>). The reference to Altivec in the above comment is a good hint of how it ended up like this.</div><div>In any case, this does not necessarily make the generated code less efficient - this is just an intermediate representation. Every target supports its own (possibly very restricted, or very wide) set of vector operations, and targets try to pattern-match complex DAG patterns to their instructions. So you can get a target shuffle-like instruction even if the target-independent DAG does not have a SHUFFLE_VECTOR.</div><div><br></div><div>Michael</div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Oct 31, 2019 at 2:14 PM Wei Zhao <<a href="mailto:wxz@marvell.com" target="_blank">wxz@marvell.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<div lang="EN-US">

<div>

<p class="MsoNormal">Hi Michael,<u></u><u></u></p>

<p class="MsoNormal"><u></u> <u></u></p>

<p class="MsoNormal">Found your email from LLVM git log.  Here is the message I posted to llvm-dev. Thought this is a quicker way to get an answer.<u></u><u></u></p>

<p class="MsoNormal"><u></u> <u></u></p>

<p class="MsoNormal">Thanks,<u></u><u></u></p>

<p class="MsoNormal"><u></u> <u></u></p>

<p class="MsoNormal">Wei<u></u><u></u></p>

<p class="MsoNormal">==============================<u></u><u></u></p>

<p class="MsoNormal">In LLVM DAG Combiner, DAGCombiner::createBuildVecShuffle() is type based.<u></u><u></u></p>

<p class="MsoNormal"><u></u> <u></u></p>

<p class="MsoNormal">DAGCombiner.cpp,  <u></u><u></u></p>

<p class="MsoNormal">   │17184     // We can't generate a shuffle node with mismatched input and output types. 

<u></u><u></u></p>

<p class="MsoNormal">   │17185     // Try to make the types match the type of the output.  

<u></u><u></u></p>

<p class="MsoNormal"><u></u> <u></u></p>

<ol style="margin-top:0in" start="1" type="1">

<li class="MsoNormal">The codes following the above comment are trying to do a matching job between the input vectors and the output vector. Why the code is based on the assumption that only matched type can be allowed to do

 a vector shuffle?<u></u><u></u></li></ol>

<p class="MsoNormal" style="margin-left:0.25in">       A shuffle takes some fields of data from the input vector and reassembles them in the output vector. It is purely a data movement operation. The input vector is the container for the source data, and the

 output vector is the container for the resulting data. Why these two containers have to have the same vector type?<u></u><u></u></p>

<p class="MsoNormal" style="margin-left:0.25in">For example, <u></u><u></u></p>

<p class="MsoNormal" style="margin-left:0.25in">VT’s type: v2i16 <u></u><u></u></p>

<p class="MsoNormal" style="margin-left:0.25in">VecIn1 and VecIn2’s type: v4i16<u></u><u></u></p>

<p class="MsoNormal" style="margin-left:0.25in">We take two i16 elements,  each from VecIn1 and VecIn2 separately. With the current code, because of their type difference, there will be no shuffle generated

<u></u><u></u></p>

<p class="MsoNormal" style="margin-left:0.25in"><u></u> <u></u></p>

<p class="MsoNormal" style="margin-left:0.25in">The requirement to create a shuffle operation should be: the capacity (SizeInBits) of the output vector can hold all the extracted data from the input vector container

<u></u><u></u></p>

<p class="MsoNormal" style="margin-left:0.25in">So as long as the total SizeInBits of the input data extracted from the input vectors does not exceed the total SizeInBits of the out vector, the shuffle should be allowed to create. Sure there are some other checks

 needed like indexes cannot be the same to avoid two data being placed in the same position.<u></u><u></u></p>

<p class="MsoNormal"><u></u> <u></u></p>

<ol style="margin-top:0in" start="2" type="1">

<li class="MsoNormal">Another inconsistence is that the split of the vector right before the createBuildVecShuffle()<u></u><u></u></li></ol>

<p class="MsoNormal"><u></u> <u></u></p>

<p class="MsoNormal">DAGCombiner.cpp,  <u></u><u></u></p>

<p class="MsoNormal" style="margin-left:0.5in">   │17436     // If all the Operands of BUILD_VECTOR extract from same                                  

<u></u><u></u></p>

<p class="MsoNormal" style="margin-left:0.5in">   │17437     // vector, then split the vector efficiently based on the maximum                         

<u></u><u></u></p>

<p class="MsoNormal" style="margin-left:0.5in">   │17438     // vector access index and adjust the VectorMask and                                       <u></u><u></u></p>

<p class="MsoNormal" style="margin-left:0.5in">   │17439     // VecIn accordingly.           

<u></u><u></u></p>

<p class="MsoNormal" style="margin-left:0.5in"><u></u> <u></u></p>

<p class="MsoNormal" style="margin-left:0.5in">This split will create a new vector type which most likely will not be the same as the output vector type. For example, if the previous vector input container and output container both have a type v8i16, after splitting,

 the input vector will have type v4i16, again this will cause no shuffle being created later by the type based createBuildVecShuffle(), missing some shuffle operations. This type based shuffle node creation makes many optimization error-prone.

<u></u><u></u></p>

<p class="MsoNormal" style="margin-left:0.5in"><u></u> <u></u></p>

<p class="MsoNormal">     Looks like the input/output container type based approach to create a shuffle node will miss some shuffle operations which makes the generated code less efficient.<u></u><u></u></p>

<p class="MsoNormal"><u></u> <u></u></p>

<p class="MsoNormal">     Any comment why it was first designed like this? <u></u><u></u></p>

<p class="MsoNormal"><u></u> <u></u></p>

</div>

</div>

</blockquote></div>