[PATCH] D36840: [DAG] convert vector select-of-constants to logic/math

Thu Aug 17 15:54:23 PDT 2017

spatel added a comment.

> On Thu, Aug 17, 2017 at 3:52 PM, <escha at apple.com> wrote:
> 
>   btw, to note, blendv *is* a “fast op” on most arches; what’s one extra uop is the fact that the V form takes 3 inputs. iirc, it’s the same reason that ADC, CMOV, etc can’t be 1 uop (2 inputs + flags). the forms that don’t take 3 inputs are usually 1 uop.

Yeah, my reality has been warped by being on x86 too long. But there's apparently good news if I'm reading Agner's numbers correctly this time: both Ryzen and Skylake can do cmov or blendv in a single cycle - getting wider vector ops to be one uop just requires widening the vector unit implementations to actually match the ISA...no problem, right? :)

That means we can canonicalize to 'select' in IR, and we're mostly done: 'select' IR becomes a (v)select node and gets lowered to the matching select instruction.

We can still detect and optimize the special cases here. Optimizing to bit-logic-ops for weak x86 becomes the uarch-specific MI transform.

Let me remove that "general" case from this patch. 
IOW, gcc is doing the wrong thing here for skylake by extending the dependency chain:
https://godbolt.org/g/5WJytM

https://reviews.llvm.org/D36840