Hi Victor,<br><br><div class="gmail_quote">On Mon, Dec 5, 2011 at 10:26 AM, Umansky, Victor <span dir="ltr"><<a href="mailto:victor.umansky@intel.com">victor.umansky@intel.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">


<div>

<font face="Calibri, sans-serif" size="2">

<div>Hi,</div>

<div> </div>

<div>My name is Victor Umansky; I’m an engineer in Intel OpenCL Team.</div>

<div> </div>

<div>The attached patch contains an optimization of ptest-conditioned branch. </div>

<div> </div>

<div>I.e., the following LLVM IR code</div>

<div> </div>

<table border="1" width="738" style="border:1 solid;border-collapse:collapse">

<colgroup><col width="738">

</colgroup><tbody><tr>

<td><font face="Verdana, sans-serif" size="2">  %res = call i32 @llvm.x86.sse41.ptestz(<4 x float> %a, <4 x float> %a) nounwind

<br>


  %tmp = and i32 %res, 1 <br>


  %one = icmp eq i32 %tmp, 0 <br>


  br i1 %one, label %label1, label %label2</font></td>

</tr>

</tbody></table>

<div> </div>

<div>ends with the following x86 machine code sequence:</div>

<div> </div>

<table border="1" width="738" style="border:1 solid;border-collapse:collapse">

<colgroup><col width="738">

</colgroup><tbody><tr>

<td><font face="Verdana, sans-serif" size="2">    ptest     XMM3, XMM3 <br>


    sete    AL <br>


    movzx    EAX, AL <br>


    test    EAX, EAX <br>


    jne    LBB18_26</font></td>

</tr>

</tbody></table>

<div> </div>

<div>which can be optimized to: </div>

<div> </div>

<table border="1" width="738" style="border:1 solid;border-collapse:collapse">

<colgroup><col width="738">

</colgroup><tbody><tr>

<td><font face="Verdana, sans-serif" size="2">             ptest     XMM3, XMM3 <br>


             je    LBB18_26</font></td>

</tr>

</tbody></table>

<div> </div>

<div> </div>

<div>The current machine code sequence stems from the need to coordinate <b>i32 return type</b> from the ptestz intrinsic with <b>i1 condition type</b> for branch IR instruction. </div>

<div>Consequently we can optimize it in x86 codegen backend where the both condition producer (ptest) amd consumer (jcc) use the <b>same x86 </b><b>EFLAGS register</b>, and thus in-between conversions of the condition can be quietly dropped.</div>


<div> </div>

<div>The optimization is focused on x86 DAG combining (post-legalization stage) which recognizes the sequence and converts it to the minimized one.</div>

<div> </div>

<div>The attached patch file includes both the x86 backend instruction combining modification and a LIT regression test for it.</div>

<div> </div>

<div> </div>

<div> </div>

<div>I’d like to commit the fix to the LLVM trunk, and your feedback will be mostly appreciated.</div>

<div><br></div></font></div></blockquote></div><div><br></div><div><br></div><div>+; RUN: llc %s -march=x86-64 -mcpu=corei7 -o %t.asm</div><div>+; RUN: FileCheck %s --input-file=%t.asm</div><div><br></div><div>Please do like the other tests, and read the file with "< %s". Also, place it under test/CodeGen/X86/brcond.ll</div>

<div><br></div>-- <br>Bruno Cardoso Lopes <br><a href="http://www.brunocardoso.cc">http://www.brunocardoso.cc</a><br>