[PATCH] D36840: [DAG] convert vector select-of-constants to logic/math

Thu Aug 17 14:52:25 PDT 2017

> On Aug 17, 2017, at 1:29 PM, Sanjay Patel via Phabricator via llvm-commits <llvm-commits at lists.llvm.org> wrote:
> 
> spatel added a comment.
> 
>>> For x86, blendv* is always a multi-uop / multi-cycle instruction according to Agner's docs
>> 
>> Are you sure?
>> 
>> Bulldozer, Piledriver, Ryzen, and Skylake seem to list PBLENDVB and BLENDVPS as 1 uop.
> 
> Oops...maybe it's not so clear anymore. Page references to the doc dated May 2, 2017:
> 
> Bulldozer, Piledriver: 1 uop and 2 cycle latency for 128-bit; 2 uops and 2 cycles for 256-bit. (p. 48, 61)
> Ryzen: 1 uop and 1 cycle for 128-bit; 2 uops and 1 cycle for 256-bit. (p. 88)
> Skylake: 1 uop and 1 cycle for 128-bit (or just legacy encoded xmm?); 2 uops and 2 cycles for 256-bit (or any vex version?) (p. 239, 243)
> 
> So yes, it seems the recent uarch are putting more effort into making blendv a fast op. I think logic ops are still the better choice for default x86 (~SandyBridge). As I suggested in the description, we could re-form a select in machine combiner or some other machine pass if that's the better choice for a particular uarch. WDYT?
> 
> 
> https://reviews.llvm.org/D36840
> 
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits

btw, to note, blendv *is* a “fast op” on most arches; what’s one extra uop is the fact that the V form takes 3 inputs. iirc, it’s the same reason that ADC, CMOV, etc can’t be 1 uop (2 inputs + flags). the forms that don’t take 3 inputs are usually 1 uop.

—escha