[PATCH] D36753: [SimplifyCFG] Do not perform tail sinking if there are extra moves introduced

Tue Aug 22 12:45:01 PDT 2017

On Tue, Aug 22, 2017 at 12:00 PM, Chandler Carruth via Phabricator <
reviews at reviews.llvm.org> wrote:

> chandlerc added a comment.
>
> I don't feel like this is really the right approach. Maybe it is the only
> approach that will work, but I'd like to at least try to solve this
> differently.
>
> Specifically, reasoning about 'mov's at the IR level really doesn't make
> sense. The IR is much more abstract than that, and in fact is SSA form! =/
> I feel like we'll end up with a heuristic that is pretty brittle and also
> have to deal with canonicalization regressions due to slicing up basic
> blocks differently...
>
> It is also probably wrong for some architectures. Think about a very RISC
> architecture or even on x86 if the constants are too large to fold into an
> immediate operand of an instruction. In those cases, commoning the actual
> logic and just setting up the constants to flow into them seems like a real
> win.
>

This is rare case, and the win will be only in the code size.  If size
matters, machine instruction level code sinking should handle this case
when constant needs to be materialized in registers.

>
> Given how much target awareness you end up needing to make this decision
> even for x86 (a relatively CISC-y architecture) I think I'd suggest a MI
> level pass that works something like the following...
>
> 1. Build up a table of x86 arithmetic instructions that we can fold
> immediates into, and the size of immediate that can be folded. These will
> be similar to the tables in X86InstrInfo.cpp. Eventually, we should encode
> this in the .td files and extract it, but I'm not suggesting crossing that
> bridge today.
>
> 2. Use this to try and hoist instructions into their predecessors if doing
> so allows folding an immediate operand, and thereby reducing # uops and
> potentially register pressure. If the predecessors aren't reachable from
> each other, you can even do this if even a single predecessor allows the
> fold without increasing the dynamically executed instruction count. But we
> don't have to try to be that fancy at first. On x86 at least, replacing
> `mov <imm>, %reg; <op> %reg, %reg` with `<op> %imm, %reg` seems like a
> solid win.
>
> Does this make sense? Are there problems with this approach?
>
> A specific place where this approach seems like it would help would be
> when the immediates require >32 bits and thus will always have a `mov`
> regardless of CFG (`movabsq`).
>

I don't think the new proposal is the right way to go.

1) There are too many such cases in LLVM that upstream components blindly
apply transformation without considering consequences. There may or may not
be down stream component to cleanit up. If the blind transformation usually
translates into performance win, adding new later passes to undo some of
the decisions makes sense, but this is not the case that results in
performance win, just a size optimization -- the added compile time (for
the new pass) needs to be considered.

2) The sinking has other effects that may not be good  potentially -- it
introduces a new basic block for sunk instructions.  This can in turn make
MBB make bad decisions (which may need tail dup to kick in).

3) the new proposal is too target dependent.

David

>
>
> https://reviews.llvm.org/D36753
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170822/04f511e5/attachment.html>