[PATCH] D41361: [SimplifyCFG] Avoid quadratic on a predecessors number behavior in instruction sinking.

Wed Dec 20 16:44:45 PST 2017

hfinkel added a comment.

In https://reviews.llvm.org/D41361#961594, @efriedma wrote:

> The question is how you materialize all the immediates.  On ARM, at least, the "select" version generates two instructions per PHI node, with a sequence like "mov imm; mowmi imm; mov imm; movwmi imm" etc.  while the branchy version generates a branch, where each block has one instruction per PHI node.  AArch64 is similar (except that we use "csel" there).  x86 is similar; we generate a "mov+mov+cmov" sequence, except for some special cases which get lowered to "lea".
>
> So clearly there's some threshold where the select-based lowering is worse, simply because the processor is executing more instructions.  Of course, the exact threshold where the branch is faster probably depends on the target, the length of the critical path, and how predictable the branch is.

Good point. I don't know if we have a heuristic that deals with that specifically, but it seems like we should. Should we move forward with this patch, file a bug, and then work on this in follow-up? Fixing the O(N^2) behavior here seems like something we should keep as separate as possible from changing the heuristics.

https://reviews.llvm.org/D41361