[PATCH] D41361: [SimplifyCFG] Avoid quadratic on a predecessors number behavior in instruction sinking.

Wed Dec 20 16:40:45 PST 2017

efriedma added a comment.

The question is how you materialize all the immediates.  On ARM, at least, the "select" version generates two instructions per PHI node, with a sequence like "mov imm; mowmi imm; mov imm; movwmi imm" etc.  while the branchy version generates a branch, where each block has one instruction per PHI node.  AArch64 is similar (except that we use "csel" there).  x86 is similar; we generate a "mov+mov+cmov" sequence, except for some special cases which get lowered to "lea".

So clearly there's some threshold where the select-based lowering is worse, simply because the processor is executing more instructions.  Of course, the exact threshold where the branch is faster probably depends on the target, the length of the critical path, and how predictable the branch is.

https://reviews.llvm.org/D41361