[PATCH] D16836: [CodeGenPrepare] Don't transform select instructions into branches when both of operands are cheap

Fri Feb 5 11:39:45 PST 2016

hfinkel added a comment.

In http://reviews.llvm.org/D16836#345163, @zansari wrote:

> I'd like to echo Sanjay's concern regarding bypassing of the CMP heuristics.
>
> Whether we agree that the CMP heuristics are useful or not, they are there, and the proposed changes preempt it based on the assumption that "if the true and false side of the branch are cheap, then the compare doesn't matter." I think this is fundamentally wrong. Even if both sides of the branch are NOPs, if we have high latency (yes, it's debatable whether a load falls into this category) instructions feeding into the comparison, the new conditional move has to sit and wait until the compare is resolved. With branches, the latency is completely hidden by the prediction (assuming it's correct). I'm not sure about ARM, but this is true for X86, at least.
>
> I think the two options are : (1) If you feel the compare heuristics are wrong, then move/change them, or (2) Move some of these smarts down into the machine level, as Sanjay suggested.
>
> I'd lean more toward (2), personally, since I'd be a little uncomfortable with (1), unless we can come up with some decent heuristics. I'd actually argue that the CMP heuristics need to be beefed up more to have more checks for high latency instructions (and I don't fully agree with the single-use check on the load), and that's always better to do lower down.

I think that (2) is the right answer.

Consider the case highlighted: The values were phis (or truncs of phis where the truncation is implicit and thus free). Function arguments are the same (at least those passed in registers). The fact is that the value is probably already in a register, so this conversion is not profitable. However, if this happens to be a spilled-value reload, then it might be a good idea, and we won't know that until quite late. For function arguments, at the IR level, we don't yet know if they'll be in registers or loaded from the stack (and even for the register arguments, we don't know if the register value was spilled).

And furthermore, since we don't (yet) have global instruction selection, breaking apart blocks like this early is almost always a bad idea (because it hampers instruction selection and scheduling, both are which are block local).

http://reviews.llvm.org/D16836