<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">

<meta name="Generator" content="Microsoft Exchange Server">

<!-- converted from rtf -->

<style><!-- .EmailQuote { margin-left: 1pt; padding-left: 4pt; border-left: #800000 2px solid; } --></style>

</head>

<body>

<font face="Calibri, sans-serif" size="2">

<div>Hi,</div>

<div> </div>

<div>My name is Victor Umansky; I’m an engineer in Intel OpenCL Team.</div>

<div> </div>

<div>The attached patch contains an optimization of ptest-conditioned branch. </div>

<div> </div>

<div>I.e., the following LLVM IR code</div>

<div> </div>

<table border="1" width="738" style="border:1 solid; border-collapse:collapse; margin-left: -5pt; ">

<col width="738">

<tr>

<td><font face="Verdana, sans-serif" size="2">  %res = call i32 @llvm.x86.sse41.ptestz(<4 x float> %a, <4 x float> %a) nounwind

<br>


  %tmp = and i32 %res, 1 <br>


  %one = icmp eq i32 %tmp, 0 <br>


  br i1 %one, label %label1, label %label2</font></td>

</tr>

</table>

<div> </div>

<div>ends with the following x86 machine code sequence:</div>

<div> </div>

<table border="1" width="738" style="border:1 solid; border-collapse:collapse; margin-left: -5pt; ">

<col width="738">

<tr>

<td><font face="Verdana, sans-serif" size="2">    ptest     XMM3, XMM3 <br>


    sete    AL <br>


    movzx    EAX, AL <br>


    test    EAX, EAX <br>


    jne    LBB18_26</font></td>

</tr>

</table>

<div> </div>

<div>which can be optimized to: </div>

<div> </div>

<table border="1" width="738" style="border:1 solid; border-collapse:collapse; margin-left: -5pt; ">

<col width="738">

<tr>

<td><font face="Verdana, sans-serif" size="2">             ptest     XMM3, XMM3 <br>


             je    LBB18_26</font></td>

</tr>

</table>

<div> </div>

<div> </div>

<div>The current machine code sequence stems from the need to coordinate <b>i32 return type</b> from the ptestz intrinsic with <b>i1 condition type</b> for branch IR instruction. </div>

<div>Consequently we can optimize it in x86 codegen backend where the both condition producer (ptest) amd consumer (jcc) use the <b>same x86 </b><b>EFLAGS register</b>, and thus in-between conversions of the condition can be quietly dropped.</div>

<div> </div>

<div>The optimization is focused on x86 DAG combining (post-legalization stage) which recognizes the sequence and converts it to the minimized one.</div>

<div> </div>

<div>The attached patch file includes both the x86 backend instruction combining modification and a LIT regression test for it.</div>

<div> </div>

<div> </div>

<div> </div>

<div>I’d like to commit the fix to the LLVM trunk, and your feedback will be mostly appreciated.</div>

<div> </div>

<div>Best Regards,</div>

<div>    Victor</div>

<div> </div>

<div> </div>

</font>

<font face="monospace">---------------------------------------------------------------------<br>

Intel Israel (74) Limited<br>

<br>

This e-mail and any attachments may contain confidential material for<br>

the sole use of the intended recipient(s). Any review or distribution<br>

by others is strictly prohibited. If you are not the intended<br>

recipient, please contact the sender and delete all copies.</font></body>

</html>