[llvm-commits] x86 branch sequence optimization in LLVM code gen: please review

Thu Jan 5 01:15:35 PST 2012

On Thu, Jan 5, 2012 at 12:18 AM, Umansky, Victor
<victor.umansky at intel.com>wrote:

> Nadav,
>
> The redundant sequence (that which is optimized by this patch) is
> generated not due to PTEST instruction, but due to ptest* intrinsic
> functions - whose API is defined as int32 by Intel.

I see that you've already committed this, but it still isn't clear to me
why we can't do something closer to Nadav's suggestion won't work.

Specifically, why do the LLVM intrinsics have to return an i32? No one is
ever able to directly call LLVM intrinsics. We can give them any API that
is useful. If this instruction really sets a flag rather than returning a
value, returning an 'i1' type in the LLVM IR would be much more accurate.

The frontend can then manage the lowering from whatever C API is used in
the library to the LLVM IR, and the IR optimizers can clean up any
redundancies that are there.

The reason this is important is that there are many many things which can
prevent the pattern you've created a peephole optimization for from
actually occurring. If we change something in the rest of the LLVM stack
that slightly alters the pattern coming out of the middle-end optimizers,
this optimization vanishes. By making the semantics of the instruction more
closely modeled by IR (and the resulting DAG), the optimization pipeline
will be much cleaner.

One example that jumps to mind is what if the icmp in your example ends up
feeding into a select rather than a branch. Does your peephole still fire?
Do we end up with ptest + cmov?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120105/32a18308/attachment.html>