[PATCH] D36753: [SimplifyCFG] Do not perform tail sinking if there are extra moves introduced

Tue Aug 22 12:00:56 PDT 2017

chandlerc added a comment.

I don't feel like this is really the right approach. Maybe it is the only approach that will work, but I'd like to at least try to solve this differently.

Specifically, reasoning about 'mov's at the IR level really doesn't make sense. The IR is much more abstract than that, and in fact is SSA form! =/ I feel like we'll end up with a heuristic that is pretty brittle and also have to deal with canonicalization regressions due to slicing up basic blocks differently...

It is also probably wrong for some architectures. Think about a very RISC architecture or even on x86 if the constants are too large to fold into an immediate operand of an instruction. In those cases, commoning the actual logic and just setting up the constants to flow into them seems like a real win.

Given how much target awareness you end up needing to make this decision even for x86 (a relatively CISC-y architecture) I think I'd suggest a MI level pass that works something like the following...

1. Build up a table of x86 arithmetic instructions that we can fold immediates into, and the size of immediate that can be folded. These will be similar to the tables in X86InstrInfo.cpp. Eventually, we should encode this in the .td files and extract it, but I'm not suggesting crossing that bridge today.

2. Use this to try and hoist instructions into their predecessors if doing so allows folding an immediate operand, and thereby reducing # uops and potentially register pressure. If the predecessors aren't reachable from each other, you can even do this if even a single predecessor allows the fold without increasing the dynamically executed instruction count. But we don't have to try to be that fancy at first. On x86 at least, replacing `mov <imm>, %reg; <op> %reg, %reg` with `<op> %imm, %reg` seems like a solid win.

Does this make sense? Are there problems with this approach?

A specific place where this approach seems like it would help would be when the immediates require >32 bits and thus will always have a `mov` regardless of CFG (`movabsq`).

https://reviews.llvm.org/D36753