<div dir="ltr">On Mon, Jul 1, 2013 at 1:33 PM, Quentin Colombet <span dir="ltr"><<a href="mailto:qcolombet@apple.com" target="_blank">qcolombet@apple.com</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote">

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div class="im">On Jul 1, 2013, at 11:52 AM, Eli Friedman <<a href="mailto:eli.friedman@gmail.com" target="_blank">eli.friedman@gmail.com</a>> wrote:<br>

</div><div><div><div class="h5"><br><blockquote type="cite"><div style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">

<div dir="ltr">On Mon, Jul 1, 2013 at 11:30 AM, Quentin Colombet<span> </span><span dir="ltr"><<a href="mailto:qcolombet@apple.com" target="_blank">qcolombet@apple.com</a>></span><span> </span>wrote:<br><div class="gmail_extra">

<div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word">

Hi,<div><br></div><div>** Problematic **</div><div>I am looking for advices to share some logic between DAG combine and target lowering.</div><div><br></div><div>Basically, I need to know if a bitcast that is about to be inserted during target specific isel lowering will be eliminated during DAG combine.</div>

<div><br></div><div>Let me know if there is another, better supported, approach for this kind of problems.</div><div><br></div><div>** Motivating Example **</div><div>The motivating example comes form the lowering of vector code on armv7.</div>

<div>More specifically, the build_vector node is lowered to a target specific ARMISD::build_vector where all the parameters are bitcasted to floating point types.</div><div><br></div><div>This works well, unless the inserted bitcasts survive until instruction selection. In that case, they incur moves between integer unit and floating point unit that may result in inefficient code.</div>

<div><br></div><div>Attached motivating_example.ll shows such a case:</div><div>llc -O3 -mtriple thumbv7-apple-ios3 motivating_example.ll -o -</div><div><div style="margin:0px;font-size:11px;font-family:Menlo"><span style="white-space:pre-wrap">   </span>ldr<span style="white-space:pre-wrap">     </span>r0, [r1]</div>

<div style="margin:0px;font-size:11px;font-family:Menlo"><span style="white-space:pre-wrap">    </span>ldr<span style="white-space:pre-wrap">     </span>r1, [r2]</div><div style="margin:0px;font-size:11px;font-family:Menlo"><span style="white-space:pre-wrap">   </span>vmov<span style="white-space:pre-wrap">    </span>s1, r1</div>

<div style="margin:0px;font-size:11px;font-family:Menlo"><span style="white-space:pre-wrap">    </span>vmov<span style="white-space:pre-wrap">    </span>s0, r0</div></div><div style="margin:0px">Here each ldr, vmov sequences could have been replaced by a simple vld1.32.</div>

<div><br></div><div>** Proposed Solution **</div><div>Lower to more vector friendly code (using a sequence of insert_vector_elt), when bit casts will not be free.</div><div>The attached patch demonstrates that, but is missing the proper check to know what DAG combine will do (see TODO).</div>

<div></div></div></blockquote></div></div><div class="gmail_extra"><br></div><div class="gmail_extra">I think you're approaching this backwards: the obvious thing to do is to generate the insert_vector_elt sequence unconditionally, and DAGCombine that sequence to a build_vector when appropriate.</div>

</div></div></blockquote></div></div><div dir="auto">Hi Eli,</div><div dir="auto"><br></div><div dir="auto">I have started to look into the direction you gave me.</div><div dir="auto"><br></div><div dir="auto">I may have miss something but I do not see how the proposed direction solves the issue. Indeed to be able to DAGCombine a insert_vector_elt sequences into a ARMISD::build_vector, I still need to know if it would be profitable, i.e., if DAGCombine will remove the bitcasts that combining/lowering is about to insert.</div>

<div dir="auto"><br></div><div dir="auto">Since target specific DAGCombine are also done in TargetLowering I do not have access to more DAGCombine logic (at least DAGCombineInfo is not providing the require information).</div>

<div dir="auto"><br></div><div dir="auto">What did I miss?</div></div><div dir="auto"><br></div></div></blockquote><div><br></div><div>Err, wait, sorry, my fault; I missed that you only insert the bitcasts on the other side of the branch.</div>

<div><br></div><div>You should be able to do it the other way, though: generate the build_vector unconditionally, and pull insert_vector_elts out of it in a DAGCombine.  (At this point, you know whether DAGCombine will remove the bit casts because if it could, it would have already done it.)</div>

<div><br></div><div>-Eli</div></div></div></div>