[llvm-commits] x86 branch sequence optimization in LLVM code gen: please review

Thu Jan 5 01:53:03 PST 2012

Well, if it would be possible to change LLVM intrinsic prototype – then you’re right, no instruction combining were required.
However until this will be done in LLVM spec – the optimization is in place.

From: Chandler Carruth [mailto:chandlerc at google.com]
Sent: Thursday, January 05, 2012 11:45
To: Umansky, Victor
Cc: Rotem, Nadav; Evan Cheng; llvm-commits at cs.uiuc.edu
Subject: Re: [llvm-commits] x86 branch sequence optimization in LLVM code gen: please review

On Thu, Jan 5, 2012 at 1:24 AM, Umansky, Victor <victor.umansky at intel.com<mailto:victor.umansky at intel.com>> wrote:
Answering Chandler’s questions:

1.      An 86 intrinsic prototype is defined by Intel together with the corresponding instruction, published in IA32 arch spec and in *intrin.h files. Consequently this prototype is accepted by all compiler providers – for compatibility reasons. The ptest* intrinsics return i32 type.
I don't think we're talking about the same thing. There are two things called intrinsics here, and they're getting improperly conflated.

1) Intel (and other vendors) provide C intrinsics for accessing particular functionality in *intrin.h header files. This include _mm_testz_si128, and all kinds of others. They are *C* functions provided and implemented as part of the *C* compiler though. That's important.

2) LLVM defines LLVM intrinsic functions as part of the LLVM IR: http://llvm.org/docs/LangRef.html#intrinsics and include/llvm/IntrinsicsX86.td are relevant for these. These are *not* C functions, and they cannot be called from C code directly. They are not a publicly visible interface.

Frontends such as Clang implement #1 by (roughly) emitting code which uses the intrinsics defined in #2.

Currently LLVM (#2) has intrinsics for PTEST which return an 'i32' even though the semantics of that instruction are best described by returning an 'i1' which can then directly be used in a 'br' instruction or a 'select' instruction. I'm suggesting changing *LLVM*'s intrinsic (#2) to return an 'i1'.

The frontend can then emit any adaptive logic necessary when lowering the interface of #1 (which remains the same, returning 'int') to code using LLVM's newly adapted intrinsics. Even if this requires "extra" or "redundant" IR, this should be optimized away *at the IR* level to preserve the generality of those optimizations, and because frankly optimizations on the IR are much easier to implement and maintain.
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120105/d492630d/attachment.html>