[llvm-commits] x86 branch sequence optimization in LLVM code gen: please review

Umansky, Victor victor.umansky at intel.com
Thu Jan 5 01:24:56 PST 2012


Answering Chandler’s questions:

1.      An 86 intrinsic prototype is defined by Intel together with the corresponding instruction, published in IA32 arch spec and in *intrin.h files. Consequently this prototype is accepted by all compiler providers – for compatibility reasons. The ptest* intrinsics return i32 type.

2.      Of course, ptest-branch combining won’t catch the cases of ptest-select or ptest-cmove. However adding these cases would require very few changes in the code.

Victor

From: Chandler Carruth [mailto:chandlerc at google.com]
Sent: Thursday, January 05, 2012 11:16
To: Umansky, Victor
Cc: Rotem, Nadav; Evan Cheng; llvm-commits at cs.uiuc.edu
Subject: Re: [llvm-commits] x86 branch sequence optimization in LLVM code gen: please review

On Thu, Jan 5, 2012 at 12:18 AM, Umansky, Victor <victor.umansky at intel.com<mailto:victor.umansky at intel.com>> wrote:
Nadav,

The redundant sequence (that which is optimized by this patch) is generated not due to PTEST instruction, but due to ptest* intrinsic functions - whose API is defined as int32 by Intel.

I see that you've already committed this, but it still isn't clear to me why we can't do something closer to Nadav's suggestion won't work.

Specifically, why do the LLVM intrinsics have to return an i32? No one is ever able to directly call LLVM intrinsics. We can give them any API that is useful. If this instruction really sets a flag rather than returning a value, returning an 'i1' type in the LLVM IR would be much more accurate.

The frontend can then manage the lowering from whatever C API is used in the library to the LLVM IR, and the IR optimizers can clean up any redundancies that are there.

The reason this is important is that there are many many things which can prevent the pattern you've created a peephole optimization for from actually occurring. If we change something in the rest of the LLVM stack that slightly alters the pattern coming out of the middle-end optimizers, this optimization vanishes. By making the semantics of the instruction more closely modeled by IR (and the resulting DAG), the optimization pipeline will be much cleaner.

One example that jumps to mind is what if the icmp in your example ends up feeding into a select rather than a branch. Does your peephole still fire? Do we end up with ptest + cmov?
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120105/c22fc5bb/attachment.html>


More information about the llvm-commits mailing list