[llvm-commits] x86 branch sequence optimization in LLVM code gen: please review

Tue Dec 6 00:52:21 PST 2011

Hi Bruno,

Thank you for the response.
I've changed the LIT test towards common look (attached).

Unfortunately, I cannot put it inside brcond.ll because the "ptest" instruction was introduced only with SSE4.1 (i.e. requires "-mcpu=penryn"), while the  current version of brcond.ll is processed with "-mcpu=core2".
Will the replacement of-mcpu in brcond.ll with "penryn" be backward-compat with regard to LIT results?

Best Regards,
    Victor

From: Bruno Cardoso Lopes [mailto:bruno.cardoso at gmail.com]
Sent: Monday, December 05, 2011 19:13
To: Umansky, Victor
Cc: llvm-commits at cs.uiuc.edu
Subject: Re: [llvm-commits] x86 branch sequence optimization in LLVM code gen: please review

Hi Victor,
On Mon, Dec 5, 2011 at 10:26 AM, Umansky, Victor <victor.umansky at intel.com<mailto:victor.umansky at intel.com>> wrote:
Hi,

My name is Victor Umansky; I'm an engineer in Intel OpenCL Team.

The attached patch contains an optimization of ptest-conditioned branch.

I.e., the following LLVM IR code

  %res = call i32 @llvm.x86.sse41.ptestz(<4 x float> %a, <4 x float> %a) nounwind
  %tmp = and i32 %res, 1
  %one = icmp eq i32 %tmp, 0
  br i1 %one, label %label1, label %label2

ends with the following x86 machine code sequence:

    ptest     XMM3, XMM3
    sete    AL
    movzx    EAX, AL
    test    EAX, EAX
    jne    LBB18_26

which can be optimized to:

             ptest     XMM3, XMM3
             je    LBB18_26

The current machine code sequence stems from the need to coordinate i32 return type from the ptestz intrinsic with i1 condition type for branch IR instruction.
Consequently we can optimize it in x86 codegen backend where the both condition producer (ptest) amd consumer (jcc) use the same x86 EFLAGS register, and thus in-between conversions of the condition can be quietly dropped.

The optimization is focused on x86 DAG combining (post-legalization stage) which recognizes the sequence and converts it to the minimized one.

The attached patch file includes both the x86 backend instruction combining modification and a LIT regression test for it.

I'd like to commit the fix to the LLVM trunk, and your feedback will be mostly appreciated.

+; RUN: llc %s -march=x86-64 -mcpu=corei7 -o %t.asm
+; RUN: FileCheck %s --input-file=%t.asm

Please do like the other tests, and read the file with "< %s". Also, place it under test/CodeGen/X86/brcond.ll

--
Bruno Cardoso Lopes
http://www.brunocardoso.cc
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20111206/4dcda820/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ptest_sequence.ll
Type: application/octet-stream
Size: 837 bytes
Desc: ptest_sequence.ll
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20111206/4dcda820/attachment.obj>