[llvm-commits] [llvm] r146357 - in /llvm/trunk: include/llvm/Intrinsics.td lib/Analysis/ConstantFolding.cpp lib/Transforms/Scalar/SimplifyLibCalls.cpp lib/VMCore/AutoUpgrade.cpp

Mon Dec 12 16:10:53 PST 2011

On Dec 12, 2011, at 4:03 PM, Chandler Carruth wrote:

> An attempt at clarification (I still am not going to pick a side here, I'm just interested in the results):
> 
> On Mon, Dec 12, 2011 at 3:47 PM, Chris Lattner <clattner at apple.com> wrote:
> The feeling was that it regresses on useful functionality: a frontend may want "defined at zero semantics" and codegen should be able to legalize in a select if needed.
> 
> I think Duncan agreed with this, but felt that codegen could match that pattern of code into the instruction where necessary. This seems plausible to me as it should merely be a comparison, the intrinsic, and a select. No CFG involved.

I don't see how that's better.  I'd much rather have the mid-level optimizer pattern match things like "x == 0 ? 32 : ctlz(x)" to a "defined at zero" ctlz variant (thereby getting a faster X86 instruction) than having codegen do it.

> Having the undef bit also allows the optimizer to infer that the input can't be zero, allowing potentially cheaper instruction sequences to be synthesized by codegen etc.
> 
> Also, I think Duncan is saying make the actual intrinsic spec that its result is undef for a zero input. Code which needs a defined result must use a comparison and a select to ignore the result of the intrinsic.

I don't see how this is better.  The form you just implemented lets us capture the information directly in the intrinsic and do a variety of other optimizations in a more clear way IMO.

> With A we have a simpler spec for the IR and for the ISD nodes in the codegen DAG. However, if a frontend wishes to provide a defined result for zero input, it must produce more complex IR, and if the backend wishes to produce efficient code for such constructs, it much use a more complex pattern.

Yes, it's unclear how a target would implement this.  Realistically, we'd need different ISD:: nodes for the two operations in either proposal.

Duncan's approach also has a problem if the select gets separated (perhaps hoisted or whatever) and not matched - this would make isel generate much more expensive code.  We've had problems with similar intrinsics before.

> With B we have a more complex spec for the IR, but it is now trivial for the frontend to select either behavior. The codegen DAG remains more complex in specification because we don't have the facilities in the codegen layer for manipulating immediates nearly as easily as we do in IR, and therefore we decompose the flag into two ISD nodes.

I am not sure what you mean by the codegen impact here.  It seems to me that codegen would work the same way with both approaches.  We want two different ISD:: nodes for the two different behaviors.

> I actually tried both implementations, and separate nodes was *significantly* simpler. Cases such as vector type legalization make it very useful to have the ISD nodes be unary.

Yes, absolutely, this allows allows a target to say that it supports one operation but not the other, implement the isel pattern, etc.  Clearly the right way to go.

-Chris

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20111212/5666d043/attachment.html>