[PATCH] D36840: [DAG] convert vector select-of-constants to logic/math

Thu Aug 17 13:29:42 PDT 2017

spatel added a comment.

>> For x86, blendv* is always a multi-uop / multi-cycle instruction according to Agner's docs
> 
> Are you sure?
> 
> Bulldozer, Piledriver, Ryzen, and Skylake seem to list PBLENDVB and BLENDVPS as 1 uop.

Oops...maybe it's not so clear anymore. Page references to the doc dated May 2, 2017:

Bulldozer, Piledriver: 1 uop and 2 cycle latency for 128-bit; 2 uops and 2 cycles for 256-bit. (p. 48, 61)
Ryzen: 1 uop and 1 cycle for 128-bit; 2 uops and 1 cycle for 256-bit. (p. 88)
Skylake: 1 uop and 1 cycle for 128-bit (or just legacy encoded xmm?); 2 uops and 2 cycles for 256-bit (or any vex version?) (p. 239, 243)

So yes, it seems the recent uarch are putting more effort into making blendv a fast op. I think logic ops are still the better choice for default x86 (~SandyBridge). As I suggested in the description, we could re-form a select in machine combiner or some other machine pass if that's the better choice for a particular uarch. WDYT?

https://reviews.llvm.org/D36840