[llvm-commits] [llvm] r155468 - in /llvm/trunk: lib/Transforms/InstCombine/InstCombineCalls.cpp test/Transforms/InstCombine/2012-04-23-Neon-Intrinsics.ll

Chandler Carruth chandlerc at google.com
Wed Apr 25 22:53:46 PDT 2012


On Wed, Apr 25, 2012 at 10:44 PM, Evan Cheng <evan.cheng at apple.com> wrote:

>
>
> On Apr 25, 2012, at 7:37 PM, Chandler Carruth <chandlerc at google.com>
> wrote:
>
> On Wed, Apr 25, 2012 at 7:07 PM, Lang Hames <lhames at gmail.com> wrote:
>
>> Hi Bob, Evan,
>>
>>>  I vaguely remember this. Do you remember why the multiply isn't moved
>>> together with the sext / zext?
>>>
>>>
>>> The change was svn 128502.  The commit message doesn't have many
>>> details.  It references rdar://8832507 and rdar://9203134.  I took a
>>> quick look at 8832507, where I commented that the zext/sext was getting
>>> moved by LICM.  Presumably only one of the operands was loop-invariant, so
>>> the multiply would remain in the loop.
>>>
>>
>> That makes sense. Thanks for the pointers to the commits/radars too.
>>
>> I talked this over with Dan Gohman this morning and we came to the
>> conclusion that the best way of handling this is probably to add a
>> specialized simplify for widening mul intrinsics. It should only simplify
>> when both operands are constant (result is constant), or either constant is
>> zero or one (result is a sext/zext). This was one of Chandler's suggested
>> solutions too. I'll work up a patch soonish.
>>
>
> While I like this solution, and particularly for widening multiplies I can
> see reasons to specifically prefer it, I'd like to point out one
> alternative to the generic problem this thread has touched on: an intrinsic
> that expands to multiple IR constructs which we would like to match back to
> exactly one instruction.
>
> I understand the difficulty of looking across BB edges and other CFG
> elements, seeing through CSE and other foldings which can impact this.
> However, we have great tools to do all of these things, especially in the
> IR. The whole point of lowering to generic IR constructs is to get access
> to these tools.
>
> My idea for how to handle these patterns is to have target-specific ISD
> nodes to represent their special semantics, and to use target-specific
> combines to reach across CFG and other constructs to form these ISD nodes
> even after optimizations. At that layer we can also avoid forming the nodes
> when the optimizations that have perturbed the target-independent code have
> actually made sufficiently significant optimizations to be superior in code
> quality to the instruction the intrinsic would naively have lowered to.
>
>
> I am not sure how this would work. Currently dag combine operates on a BB
> at a time. Or are you thinking about the selection dag builder time
> optimization that we added to deal specific with sext / zext. That's not a
> general solution though. One day we will implement whole function isel, and
> this approach should work well then.
>

Yea, Owen clarified the BB-at-a-time nature of this to me in IRC.

What I think would still work is to add target-specific hooks to the
selection dag builder, so that it can form target specific nodes directly
from IR. It's a bit gross, but it offers a lot of selective flexibility.

I completely agree that whole-function isel is a more general solution to
this problem. If we can delay committing to very much in either direction
until that's available, excellent. If there is a pressing need (and I don't
think the arm intrinsics that started this are such, i completely agree
with the plan to do specific combines there), I would prefer adding
target-hooks to the dag builder to keep the IR representation and middle
end optimizations clean and ignorent of these details...


It also depends on what kind the programs do you optimize for. Some expert
> programmers would take you that they expect strict one to one translation
> from intrinsics to instructions since compiler will never match hand
> crafted assembly.
>

Yea... My fundamental philosophy is that they should use hand crafted
assembly. =] Inline asm (or even better, an assembly file) seem like the
right tool for the job here.

But indeed, this is an age-old debate, which I suspect no one will ever
win....
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120425/36d003fb/attachment.html>


More information about the llvm-commits mailing list